Weekly Report 2021, August 30 - September 5

Slower week in terms of pull requests as I coded more myself and did some release management work.

Stats-wise:

  • issues: 8 closed
  • PRs: 6 authored, 40 closed, 3 reviewed

Highlights

This week I released Python 3.9.7 and 3.8.12. Upgrade if you’re using those versions of Python. If you’re stuck on 3.6 or 3.7, let me know why.

There’s been a series of deprecations this week. We finally removed the binhex module that was one of the dead batteries listed in PEP 594. See BPO-45085 for details. We disabled the long-deprecated “U” mode to open() in GH-28118. The lib2to3 package got officially deprecated after it was marked pending deprecation in Python 3.9. It’s unmaintained and based on an LL(1) parser which makes it impossible for it to properly support Python 3.10+ syntax. Details in GH-28116.

Another fun change: zlib.compress now accepts a wbits parameter which allows users to compress data as a raw deflate block without zlib headers and trailers in one go. This allowed making gzip.compress and gzip.decompress faster by doing operations at once in memory, making them 1.39X and 2.23X faster, respectively, for data sizes of 16K and below. Details in GH-27941.

select fun(*) from sqlite;

I spent quite some time coding, both fiddling with backports or autotools, as well as working on the contribution stats gathering. At first I made quite a detour here due to wanting to make the data available in SQLite format. I’m working with a bunch of pickled dataclasses myself as this gives me a lot of programmatic flexibility. To import them to SQLite I decided to use Simon Willison’s sqlite-utils which create schemas automatically from inserted data.

Of course, real life turned out to be more complicated than that. Dealing with nested data and foreign keys was somewhat surprising so while the end result is quite tidy, it took me some time to get there. Namely, I wasn’t quite sure how compound primary keys are supposed to be represented as foreign keys in sqlite-utils, and the magic import from data had to be split per-table imports.

Getting my dataclasses to a dictionary form should be as easy as using dataclasses.asdict but I forgot it existed and so tried to use cattrs instead. That one claims it supports “attrs classes and dataclasses out of the box” which I misread to think it supports dataclasses. The detour was eventful though as I created a small library that can convert Python 3.10+ string annotations which use shorthand union syntax like str | None into the classic Union[str, None] syntax so that Python 3.9 typing.get_type_hints() works fine on them at runtime. The library lives with the rest of the data munching scripts for now, I might extract it later.

Anyway, once the data got inserted to SQLite, no query worked because you need to index foreign keys (with a trivial db.index_foreign_keys() once you know about it!). But once that got solved, Datasette can now be used on the file just fine:

The query in the picture shows which files in the cpython repo are changed the most. Perhaps surprisingly, the deepest core of the interpreter, the eval loop, seems to be touched by the most GitHub pull requests.

This is a work in progress. You can download the current file from here: db-20210905.sqlite

I still need to correlate the data with the Git repository information and bugs.python.org. Some indexes are probably still missing as some queries still die for me after eating all my disk space. I’d also like to have full-text search on the PR titles, descriptions, and comments, too. This will be happening next week.

Detailed Log

Monday

Released Python 3.9.7 and 3.8.12.

Issues:

PRs:

Tuesday

Issues:

PRs:

Wednesday

Coding day.

Thursday

Issues:

PRs:

Friday

Issues:

PRs:

#Python/Developer-in-Residence