Weekly Report 2021, July 12 - 18

2021-07-17 13:15:00

First week of work done. While there are bigger overarching goals for the role, this week I decided to dive right into PR review which looking at our growing open PR numbers needs more help.

Summing up the week in numbers, I closed 14 issues and 54 PRs, reviewed 9 PRs, and authored 6 own PRs. Don’t get too excited though about those numbers, the way CPython is developed, many changes start on the main branch, and then get backported to 3.10, and often also to 3.9. So some changes are tripled in those stats.

I still include them since they often need some trivial hand-holding that’s nevertheless required to merge them like reformatting the description on GitHub, putting relevant labels on, etc. More importantly, backports require some degree of care because code is different between feature branches. For example to backport the fix for BPO-40897 to the 3.9 branch I had to prepare the inspect.py code in that branch with GH-27193 to accept the backport without conflicts.

Finding balance

Additionally, some of PRs I chose in the first week were rather uncontroversial and trivial to merge. It’s psychological both for me and you. On my end I see the satisfying “Merged” purple on the PR, and on your end you can see that the wheels are turning. In time there will be more tricky changes I touch, sure, but I feel like I really need to avoid getting bogged down by some hard problem and letting it take up too much of my time.

At the same time sometimes coding is exactly the right solution to the problem. And often coding requires setup. Those are tasks that take time but don’t necessarily leave tangible artifacts. And I think visibility is key. So there’s some balance I think I’ll have to strike here. I’m starting with the weekly reports that include a detailed day-by-day log at the end if you’re interested.

Highlights

Two probably most interesting problems I faced with this week were magically failing test_httpservers on Azure Pipelines Windows 2019, and reviewing performance changes to Objects/stringlib/fastsearch.h (used for instance in bytes.find).

`test_httpservers` failures: Unicode strikes again

Working on an unrelated change on Tuesday, I noticed new test failures on Azure Pipelines in test_httpservers, specifically HTTP_ACCEPT tests for the CGI HTTP server had truncated output.

I went on to install Windows 10, Visual Studio, and configure the dev environment for Windows to debug the problem. I pestered Steve Dower a bunch of times. Frustratingly, the problem didn’t reproduce in my Windows 10 VM.

I later noticed that the problem is only with Azure Pipelines and not with GitHub Actions where CPython also runs Windows tests. Since Azure Pipelines just released an update to their Windows image and I identified that the problem only happens with the new image, I was ready to report a regression on their GitHub project. But I noticed that somehow the problem only happens to me.

I joked that this is probably because of my first name… and it turned out to be true. Maybe I shouldn’t be surprised. I have a long history of Unicode-related issues affecting me in the real world.

Long story short: the new Azure Pipelines image includes the full name of the PR author in an environment variable. The HTTP_ACCEPT tests tried to serialize all of os.environ to an ASCII string in the CGI script. This crashed the CGI script but since the tested server is fork-based, this crash was only seen on the test side as truncated output.

Knowing this, it was easy to fix. But this easy took 10 hours of work for me this week.

Horspool in `fastsearch.h`

Dennis Sweeney prepared a comprehensive refactor of fastsearch.h which allows for a shorter inner loop for most cases. It looks like it’s a 22% (geometric mean) performance improvement for real-world cases, pretty exciting.

Reviewing this wasn’t trivial as I was unfamiliar with the two-way string matching algorithm before and the PR is pretty much a comprehensive rewrite of fastsearch.h. While I couldn’t find any functional issues, I felt like I could use some more assurance.

I went on and created some basic property-based tests with Hypothesis for this use case and ran them for most of Friday. Not only did they confirm there’s no crashing or invariant issue with the new code, but I also added tests that ensured the results are the same between GH-27091 and Python 3.9.6.

I don’t have much experience with Hypothesis so I suspect there are flaws in my approach but I have to say it was pretty easy to set it all up.

Merging Dennis’ change needs to wait for next week as we figure out some last details about the cutoff values giving most performant results.

Plans for next week

Pablo Galindo Salgado noted that when the CPython test suite re-runs a flaky test that failed, it doesn’t actually re-run just that failed test but the entire test file. The most common case for flaky tests is when networking, threading, or multiprocessing is involved. Frustratingly those are some of the largest test files we have. Re-running them takes a lot of time. So the plan is to tweak the testing machinery to only re-run what actually failed. Of course, it would be best to avoid flaky tests altogether and we’re working on that, but re-running what failed is a pragmatic stopgap that is necessary in times of distributed CI that we don’t fully control.

On top of that, there are three issues I’ve been pinged to look at and I intend to do that next week: BPO-44618, BPO-44594, and BPO-44037.

Detailed Log

Monday

4 issues closed, 18 PRs closed, 1 PR reviewed.

closed issue BPO-43207
closed issue BPO-26329
closed issue BPO-43048
closed issue BPO-42194
closed pull request GH-11754
closed pull request GH-26979
closed pull request GH-6572
reviewed pull request GH-24848
closed pull request peps#2031
closed pull request peps#1679
closed pull request peps#1591
closed pull request GH-27094
closed pull request GH-26506
closed pull request GH-26831
closed pull request GH-26803
closed pull request GH-26322
closed pull request GH-25944
closed pull request GH-24460
closed pull request GH-23026
closed pull request GH-27095
closed pull request GH-27097
closed pull request GH-24460
closed pull request GH-27098

Tuesday

3 issues closed, 12 PRs closed, 1 PR authored, 1 PR reviewed.

closed issue BPO-38741
closed issue BPO-44514
closed issue BPO-43126
closed pull request GH-27103
closed pull request GH-27100
reviewed pull request GH-27083
closed pull request GH-27073
closed pull request GH-17129
closed pull request GH-27061
closed pull request GH-27111
closed pull request GH-27112
closed pull request GH-27110
closed pull request GH-27113
closed pull request GH-27114
closed pull request GH-22757
authored pull request GH-27115
closed pull request GH-27108

Wednesday

Coding day.

Hunting down what became BPO-44647 the next day: installed Windows 10, Visual Studio, and configured dev environment for Windows to debug BPO-42073 with Steve Dower’s help.

Found regression in BPO-38210, not worth reverting.

Thursday

3 issues closed, 7 PRs closed, 4 PRs reviewed, 3 PRs authored.

closed issue BPO-43948
closed issue BPO-42073
closed issue BPO-44647
reviewed pull request GH-27144
reviewed pull request GH-27137
reviewed pull request GH-27128
closed pull request GH-27093
closed pull request GH-27159
closed pull request GH-27084
reviewed pull request GH-27091
closed pull request GH-27123
closed pull request GH-27115
closed pull request GH-27162
closed pull request GH-27163
authored pull request GH-27161
authored pull request GH-27169
authored pull request GH-27170

Friday

8 PRs closed, 3 PRs reviewed, 2 PRs authored.

tested GH-27091 with Hypothesis:
https://gist.github.com/ambv/01c2c4ef79515f158dea00ad93498696
reviewed pull request GH-27083
closed pull request GH-27179
closed pull request GH-27180
authored pull request GH-27187
reviewed pull request GH-27188
reviewed pull request GH-27190
closed pull request GH-27177
closed pull request GH-27187
closed pull request GH-27189
closed pull request GH-27104
authored pull request GH-27193
closed pull request GH-27194
closed pull request GH-27195

Saturday

4 issues closed, 9 PRs closed.

closed issue BPO-40897
closed issue BPO-41249
closed issue BPO-44659
closed issue BPO-42095
closed pull request GH-27193
closed pull request GH-27209
closed pull request GH-27204
closed pull request GH-27205
closed pull request GH-27210
closed pull request GH-27212
closed pull request GH-27213
closed pull request GH-26981
closed pull request GH-27173

#Python/Developer-in-Residence