Weekly Report 2021, July 19 - 25

2021-07-24 14:15:00

Second week passed as fast as the first one, if not faster. This time I dug into more non-trivial issues. While most were around typing, there was also quite a bit of C involved. Also, we were able to close Dennis’ PR to speed up fastsearch.h (bytes.find is now 22%+ faster on real-world data).

I closed 12 issues, authored 2 PRs, closed 47 PRs, and reviewed 5. Stats-wise this looks less impressive than last week but there’s more depth compared to last week in the pushed changes.

Highlights

Test Re-run Speedup

Some tests depend on state that’s volatile like networking, inter-process communication, or perhaps time-sensitive values like timeouts. While we’re always on the lookout for unnecessary volatility of this kind, some is inevitable. This is especially true for socket tests, asyncio tests, concurrent.futures tests.

The unfortunate result of volatile tests is that in environments that we don’t entirely control, like CI, those tests are “flaky” which means they sometimes fail intermittently. We re-run them when the -w option is passed to the regression test runner.

Up until now, re-runs triggered re-running all tests in the affected test file. As you can imagine, the test files for the most flaky tests tend to also be the largest. Now with a change I made (see BPO-44708 for details), we are only re-running affected test methods which speeds up the occasional re-run dramatically. Let me give you an example from Azure Pipelines.

Before (https://dev.azure.com/Python/cpython/_build/results?buildId=84549):

0:10:08 load avg: 1.07 Re-running test_concurrent_futures in verbose mode
test_cancel (test.test_concurrent_futures.FutureTests) ... ok
test_cancelled (test.test_concurrent_futures.FutureTests) ... ok
test_done (test.test_concurrent_futures.FutureTests) ... ok
test_done_callback_already_cancelled (test.test_concurrent_futures.FutureTests) ... ok
[ ... output of 220 tests cut ... ]
test_first_exception_some_already_complete (test.test_concurrent_futures.ThreadPoolWaitTests) ... ok
test_timeout (test.test_concurrent_futures.ThreadPoolWaitTests) ... ok

----------------------------------------------------------------------

Ran 226 tests in 82.721s

OK (skipped=111)

== Tests result: FAILURE then SUCCESS ==

After (https://dev.azure.com/Python/cpython/_build/results?buildId=84581):

0:08:10 load avg: 3.67 Re-running test_concurrent_futures in verbose mode (matching: test_interpreter_shutdown)
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolForkProcessPoolShutdownTest) ... skipped 'require unix system'
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolForkserverProcessPoolShutdownTest) ... skipped 'require unix system'
test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest) ... ok
test_interpreter_shutdown (test.test_concurrent_futures.ThreadPoolShutdownTest) ... ok

----------------------------------------------------------------------

Ran 4 tests in 2.433s

OK (skipped=2)

== Tests result: FAILURE then SUCCESS ==

As you can see, re-running concurrent_futures tests now takes 2.5s instead of over 82s.

Interestingly, this not only increases speed of the re-runs but also makes it more likely that they will succeed. You see, the fewer tests we need to re-run, the less likely we are to hit another intermittent failure.

`typing.NewType` is now a class

If you’re not familiar with this construct, NewType allows you to specify that a certain integer or string actually means something special. For example:

SSN = NewType("SSN", int)  # US Social Security Number
GitBranchName = NewType("GitBranchName", str)

This allows the type checker to understand that, for example, not all integers will be valid when passed to some particular function but they have to be SSNs. To signify that a number is an SSN, the programmer wraps an int like this:

maybe_ssn = get_int_from_web_form('ssn')
if is_valid_ssn(maybe_ssn):
    return SSN(maybe_ssn)
raise ValueError(f"Invalid value passed as SSN: {maybe_ssn}")

In Python 3.9 and before NewType used to return a trivial identity function and the information around it was lost at runtime. Thanks to Ken Jin, Yurii Karabas, and Serhiy Storchaka, in Python 3.10 and later NewType returns a class that allows retaining the information on the specific new type created by the user. This opens up new possibilities for runtime introspection, NewType objects are now even picklable!

Additionally, in Python 3.11 we’ll have a small C accelerator for typing that makes the new NewType comparable performance to the identity function from Python 3.9 and before.

We really like better error messages

Thanks to Filipe Laíns, circular imports in submodules now report a much clearer error message.

Given this package structure:

$ tree a
a
├── b
│   ├── c.py
│   └── __init__.py
└── __init__.py

1 directory, 3 files
$ cat a/b/__init__.py
import a.b.c
$ cat a/b/c.py
import a.b

Before:

Python 3.9.5 (default, May  4 2021, 03:33:11)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/cpymain/a/b/__init__.py", line 1, in <module>
    import a.b.c
  File "/private/tmp/cpymain/a/b/c.py", line 3, in <module>
    a.b
AttributeError: module 'a' has no attribute 'b'

After:

Python 3.11.0a0 (heads/main:a22b05da87, Jul 24 2021, 13:42:31)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/cpymain/a/b/__init__.py", line 1, in <module>
    import a.b.c
    ^^^^^^^^^^^^
  File "/private/tmp/cpymain/a/b/c.py", line 3, in <module>
    a.b
    ^^^
AttributeError: cannot access submodule 'b' of module 'a'
(most likely due to a circular import)

Much cleaner, isn’t it?

Bonus Head Scratcher

As discovered by Serhiy Storchaka in BPO-43838, read-only __dict__ mappings of builtin types can in fact be written to if you try hard enough:

>>> class Sneaky:
...     def __eq__(self, other):
...         other['real'] = 42
... 
>>> int.__dict__ == Sneaky()
>>> (1).real
42

Crazy, right? We’re still investigating what the proper course of action is for this, since the likelihood of damage in this case is low and the performance impact of copying the dictionary for safety would be potentially big for large types. On the other hand, some changes to __dict__ of builtin types might crash the interpreter.

Plans for next week

Apart from the test re-run implementation I haven’t gotten to the other plans I highlighted last week. So BPO-44618 and BPO-44594 will still need my attention. Ned Deily investigated BPO-44037 and we’ll have some concrete results around it next week.

More urgently, next week is the last week before Python 3.10 RC1. The release manager asked me to coordinate pushing out final required changes to typing before that release happens.

Detailed Log

Monday

Issues:

closed issue BPO-27513
closed issue BPO-41972
closed issue BPO-44524
closed issue BPO-44672

PRs:

closed pull request GH-27091
reviewed pull request GH-27237
closed pull request GH-27238
closed pull request cpython-source-deps#25
closed pull request GH-27083
closed pull request GH-13797
closed pull request GH-27243
closed pull request GH-27242
closed pull request GH-27237
closed pull request GH-27245
closed pull request GH-27246

Tuesday

Issues:

closed issue BPO-44621
closed issue BPO-44631

PRs:

closed pull request GH-27255
closed pull request GH-27250
closed pull request GH-27259
closed pull request GH-27258
closed pull request GH-27261
reviewed pull request GH-27244
closed pull request GH-27253
closed pull request GH-27128
closed pull request GH-27024
authored pull request GH-27269
closed pull request GH-26964

Wednesday

Coding day.

PRs:

closed pull request GH-26933
closed pull request GH-27274
closed pull request GH-27275
closed pull request GH-27276
closed pull request peps#2038

Thursday

Issues:

closed issue BPO-44353
closed issue BPO-44708
closed issue BPO-44653

PRs:

closed pull request GH-27270
closed pull request GH-27285
closed pull request GH-27284
closed pull request GH-27287
closed pull request GH-27290
closed pull request GH-27262
closed pull request GH-27293
closed pull request GH-27247
closed pull request GH-27232
closed pull request GH-27296
reviewed pull request GH-27278

Friday

Issues:

closed issue BPO-44676
investigated issue BPO-44707

PRs:

closed pull request GH-27244
reviewed pull request GH-27302
closed pull request GH-27305
authored pull request GH-27309
reviewed pull request GH-27299

Saturday

Issues:

closed issue BPO-44717
closed issue BPO-44720

PRs:

closed pull request GH-27311
closed pull request GH-27319
closed pull request GH-27314
closed pull request GH-27321
closed pull request GH-27324
closed pull request GH-27325
closed pull request GH-27298
closed pull request GH-27326
closed pull request GH-27299
closed pull request GH-27328
closed pull request GH-27327

#Python/Developer-in-Residence

Weekly Report 2021, July 19 - 25

Highlights

Test Re-run Speedup

typing.NewType is now a class

We really like better error messages

Bonus Head Scratcher

Plans for next week

Detailed Log

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

`typing.NewType` is now a class