Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?
My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:
- Complicates the implementation of the interpreter
- Complicates C extensions, and
- Causes single-threaded code to run slower
Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?
> Does free-threaded Python provide the same guarantees
Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.
Now, my rants:
> have any other effects on multi-threaded Python code
It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.
> Complicates C extensions
The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.
> Causes single-threaded code to run slower
That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.
> That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.
Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.
Sure but of those 99%, how many are performance-sensitive, CPU-bound (in Python not in C) applications? It's clearly some, not saying it's an easy tradeoff, but I assume the large majority of Python programs out there won't notice a slowdown.
That's the mindset that leads to the funny result that `uv pip` is like 10x faster than `pip`.
Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!
Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.
For anyone who hasn't used uv, I feel like 10x faster is an understatement. For cases where packages are already downloaded it's basically instant for any use case I have run into.
As I recall, CPython has also been getting speed-ups lately, which ought to make up for the minor single-threaded performance loss introduced by free threading. With that in mind, the recent changes seem like an overall win to me.
Any webserver that wants to cache and reuse content cares about shared state, but usually has to outsource that to a shared in-memory database because the language can't support it.
Your understanding is correct. You can use all the cores but it's much slower per thread and existing libraries may need to be reworked. I tried it with PyTorch, it used 10x more CPU to do half the work. I expect these issues to improve, still great to see after 20 years wishing for it.
It makes race conditions easier to hit, and that will require multi-threaded Python to be written with more care to achieve the same level of reliability.
My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:
- Complicates the implementation of the interpreter
- Complicates C extensions, and
- Causes single-threaded code to run slower
Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?