Weekly Report 2021, July 19 - 25

Second week passed as fast as the first one, if not faster. This time I dug into more non-trivial issues. While most were around typing, there was also quite a bit of C involved. Also, we were able to close Dennis’ PR to speed up fastsearch.h (bytes.find is now 22%+ faster on real-world data).

I closed 12 issues, authored 2 PRs, closed 47 PRs, and reviewed 5. Stats-wise this looks less impressive than last week but there’s more depth compared to last week in the pushed changes.

Highlights

Test Re-run Speedup

Some tests depend on state that’s volatile like networking, inter-process communication, or perhaps time-sensitive values like timeouts. While we’re always on the lookout for unnecessary volatility of this kind, some is inevitable. This is especially true for socket tests, asyncio tests, concurrent.futures tests.

The unfortunate result of volatile tests is that in environments that we don’t entirely control, like CI, those tests are “flaky” which means they sometimes fail intermittently. We re-run them when the -w option is passed to the regression test runner.

Up until now, re-runs triggered re-running all tests in the affected test file. As you can imagine, the test files for the most flaky tests tend to also be the largest. Now with a change I made (see BPO-44708 for details), we are only re-running affected test methods which speeds up the occasional re-run dramatically. Let me give you an example from Azure Pipelines.

Before (https://dev.azure.com/Python/cpython/_build/results?buildId=84549):

0:10:08 load avg: 1.07 Re-running test_concurrent_futures in verbose mode
test_cancel (test.test_concurrent_futures.FutureTests) ... ok
test_cancelled (test.test_concurrent_futures.FutureTests) ... ok
test_done (test.test_concurrent_futures.FutureTests) ... ok
test_done_callback_already_cancelled (test.test_concurrent_futures.FutureTests) ... ok
[ ... output of 220 tests cut ... ]
test_first_exception_some_already_complete (test.test_concurrent_futures.ThreadPoolWaitTests) ... ok
test_timeout (test.test_concurrent_futures.ThreadPoolWaitTests) ... ok

----------------------------------------------------------------------

Ran 226 tests in 82.721s

OK (skipped=111)

== Tests result: FAILURE then SUCCESS ==

After (https://dev.azure.com/Python/cpython/_build/results?buildId=84581):

0:08:10 load avg: 3.67 Re-running test_concurrent_futures in verbose mode (matching: test_interpreter_shutdown)
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolForkProcessPoolShutdownTest) ... skipped 'require unix system'
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolForkserverProcessPoolShutdownTest) ... skipped 'require unix system'
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... ok
test_interpreter_shutdown (test.test_concurrent_futures.ThreadPoolShutdownTest) ... ok

----------------------------------------------------------------------

Ran 4 tests in 2.433s

OK (skipped=2)

== Tests result: FAILURE then SUCCESS ==

As you can see, re-running concurrent_futures tests now takes 2.5s instead of over 82s.

Interestingly, this not only increases speed of the re-runs but also makes it more likely that they will succeed. You see, the fewer tests we need to re-run, the less likely we are to hit another intermittent failure.

typing.NewType is now a class

If you’re not familiar with this construct, NewType allows you to specify that a certain integer or string actually means something special. For example:

SSN = NewType("SSN", int)  # US Social Security Number
GitBranchName = NewType("GitBranchName", str)

This allows the type checker to understand that, for example, not all integers will be valid when passed to some particular function but they have to be SSNs. To signify that a number is an SSN, the programmer wraps an int like this:

maybe_ssn = get_int_from_web_form('ssn')
if is_valid_ssn(maybe_ssn):
    return SSN(maybe_ssn)
raise ValueError(f"Invalid value passed as SSN: {maybe_ssn}")

In Python 3.9 and before NewType used to return a trivial identity function and the information around it was lost at runtime. Thanks to Ken Jin, Yurii Karabas, and Serhiy Storchaka, in Python 3.10 and later NewType returns a class that allows retaining the information on the specific new type created by the user. This opens up new possibilities for runtime introspection, NewType objects are now even picklable!

Additionally, in Python 3.11 we’ll have a small C accelerator for typing that makes the new NewType comparable performance to the identity function from Python 3.9 and before.

We really like better error messages

Thanks to Filipe Laíns, circular imports in submodules now report a much clearer error message.

Given this package structure:

$ tree a
a
├── b
│   ├── c.py
│   └── __init__.py
└── __init__.py

1 directory, 3 files
$ cat a/b/__init__.py
import a.b.c
$ cat a/b/c.py
import a.b

Before:

Python 3.9.5 (default, May  4 2021, 03:33:11)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/cpymain/a/b/__init__.py", line 1, in <module>
    import a.b.c
  File "/private/tmp/cpymain/a/b/c.py", line 3, in <module>
    a.b
AttributeError: module 'a' has no attribute 'b'

After:

Python 3.11.0a0 (heads/main:a22b05da87, Jul 24 2021, 13:42:31)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/cpymain/a/b/__init__.py", line 1, in <module>
    import a.b.c
    ^^^^^^^^^^^^
  File "/private/tmp/cpymain/a/b/c.py", line 3, in <module>
    a.b
    ^^^
AttributeError: cannot access submodule 'b' of module 'a'
(most likely due to a circular import)

Much cleaner, isn’t it?

Bonus Head Scratcher

As discovered by Serhiy Storchaka in BPO-43838, read-only __dict__ mappings of builtin types can in fact be written to if you try hard enough:

>>> class Sneaky:
...     def __eq__(self, other):
...         other['real'] = 42
... 
>>> int.__dict__ == Sneaky()
>>> (1).real
42

Crazy, right? We’re still investigating what the proper course of action is for this, since the likelihood of damage in this case is low and the performance impact of copying the dictionary for safety would be potentially big for large types. On the other hand, some changes to __dict__ of builtin types might crash the interpreter.

Plans for next week

Apart from the test re-run implementation I haven’t gotten to the other plans I highlighted last week. So BPO-44618 and BPO-44594 will still need my attention. Ned Deily investigated BPO-44037 and we’ll have some concrete results around it next week.

More urgently, next week is the last week before Python 3.10 RC1. The release manager asked me to coordinate pushing out final required changes to typing before that release happens.

Detailed Log

Monday

Issues:

PRs:

Tuesday

Issues:

PRs:

Wednesday

Coding day.

PRs:

Thursday

Issues:

PRs:

Friday

Issues:

PRs:

Saturday

Issues:

PRs:

#Python/Developer-in-Residence