Weekly Report August 1 - 7

This week I spent helping 3.11.0rc1 get released. This included both reviewing PRs and helping release blockers get resolved. There were two in particular that I spent most time on so I’ll talk briefly about them now.

Cancel this task. Uh, nevermind… uncancel. UNCANCEL!

I talked before about except* being really exciting for Python 3.11 and since then task groups got added to CPython. It’s an awesome piece of structured concurrency that makes tracking running tasks automatic and bullet-proof. This solves a long-standing complication of managing asyncio programs.

There’s a pretty interesting tidbit in that piece of machinery that has to do with an async function that has an async with block with a TaskGroup. Something like:

async def do_things():
   await some_code()
   async with asyncio.TaskGroup() as tg:
       tg.create_task(code_scheduled_within_the_task_group())
       await other_code_awaited_on_within_the_task_group()
   await unrelated_code()

In Python 3.10 and older, when a task that we’re awaiting on got cancelled, the cancellation bubbled up to us as well. With task groups the problem is that we both want the cancellation and don’t want it: if code_scheduled_within_the_task_group() gets cancelled for any reason, we do want to cancel other_code_awaited_on_within_the_task_group() (because it’s structured concurrency: it’s organized as part of the task group) but we do not want to cancel unrelated_code() that is outside of the task group.

But we can’t just cancel a block of code within an async function! Or can we? Turns out we can, with the help of the new Task.uncancel method! The way it works is that it will cancel do_things() but as soon as the task group tg exits, it will uncancel do_things() so it can continue running.

Fascinatingly, this ability is what held back a nice timeout context manager all this time. Now we can have it:

async def make_request_with_timeout():
    try:
        async with asyncio.timeout(1):
            # Structured block affected by the timeout:
            await make_request()
            await make_another_request()
    except TimeoutError:
        log("There was a timeout")
    # Outer code not affected by the timeout:
    await unrelated_code()

I’m hoping to add this example to the docs.

Emails are weird

One problem I took care of this week, unrelated to CPython itself but affecting the core developer workflow, was the CLA bot’s handling of email address uniqueness.

You see, email addresses aren’t as bad as HTML, which as we all know cannot be parsed with regex. But the full RFC 5322 compliant regex for email addresses is quite a beast.

The particular problem we had was due to privacy-related features built into email addresses, namely +suffixes for sorting task-specific emails and that dots don’t matter in Gmail addresses.

It turns out that people sometimes use a different email variant for git commits and a different email associated with their GitHub account. For example, one might be tylerdurden+git@ikea.com, and the other might be tylerdurden+github@ikea.com. And sometimes a new laptop got configured and the user.email field in ~/.gitconfig got configured to yet another address, like TylerDurden@ikea.com. We already make email address matching case insensitive but I had to also make +suffixes irrelevant. For Gmail addresses the additional complication is that tyler.durden@gmail.com is the same as tylerdurden@gmail.com.

As you can see in the PR for the required change, thanks to EdgeDB’s built-in migrations this was a rather simple thing to do. However, I had to split the process in two parts because the normalized_email field in the schema needs to be unique and in 19 cases in our production database the new normalization would end up with duplicate addresses. So I had to add a short script to deduplicate the database before the final schema change was pushed to production.

Week in numbers

  • issues: 2 closed
  • PRs: 26 closed

Detailed Log

Issues:

PRs:

#Python/Developer-in-Residence