Python 3.12 Goals

Subject to change

This is a short summary of the major themes of work that the Faster CPython plans to land in 3.12.

Trace optimizer

UPDATE: The tracing optimizer is not scheduled to land in 3.12

While the speed improvements in 3.11 mainly involved replacing individual opcodes with faster context-specific ones (adaptive opcode specialization), the next big set of improvements will come from optimizing runs of multiple opcodes. To enable this, many of the existing high-level opcodes will be replaced with lower-level opcodes, for example, separate opcodes for version number checking and reference counting. These simpler opcodes will have more opportunities for optimizations, for example, by removing redundant reference count operations. These lower-level opcodes also put us closer to a set of instructions suitable for machine code generation in the future (both in CPython and third-party JIT projects).

To enable this, the interpreter loop will be generated from a declarative description. This should reduce a class of bugs related to keeping the interpreter loop in sync with some related functions (mark_stacks, stack_effect etc.), but also allow us to experiment with large changes to the interpreter loop.

Multi-threaded parallelism

Python currently has a single global interpreter lock per process, which prevents multi-threaded parallelism. This work, described in PEP 684, is to make all global state thread safe and move to a global interpreter lock (GIL) per sub-interpreter. Additionally, PEP 554 will make it possible to create subinterpreters from Python (currently a C API-only feature), opening up true multi-threaded parallelism.

More specializations

We have done an analysis of which bytecodes would benefit the most from specialization and plan to complete the remaining high-benefit ones for 3.12.

Smaller object structs

There are a number of opportunities for decreasing the size of Python object structs. Since they are used so frequently, this benefits not just overall memory usage, but cache coherency as well. We plan to implement the most promising of these ideas for 3.12.

There are some tradeoffs between backward compatibility and performance here that is likely to result in a PEP to build consensus.

Reduced memory management overhead

Not only are we reducing the size of objects, we are making their layout more regular. This provide opportunities to optimize allocation and freeing of memory, as well as speeding up traversal of objects during GC and de-allocation.

API stability

In addition to the above projects, the team is contributing to the overall quality of the CPython codebase:

Making the compiler easier to maintain and test by reducing coupling of the different compilation stages.
Proactively monitoring and improving code coverage of the CPython test suite at the C level.
Improving the pyperformance benchmarking suite to include more representative real-world workloads.
Assisting with CPython issues and PRs, in particular those that are related to performance.
Increasing our set of standard benchmarking machines and results to include macOS and Windows.
Continuing to collaborate with major projects that make use of Python internals to help them adapt to changes in the CPython interpreter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly