Does removal of the GIL have any *other* effects on multi-threaded Python code (...

rfoo · 2025-05-16T11:58:28 1747396708

> Does free-threaded Python provide the same guarantees

Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.

Now, my rants:

> have any other effects on multi-threaded Python code

It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.

> Complicates C extensions

The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.

> Causes single-threaded code to run slower

That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

celeritascelery · 2025-05-16T14:30:54 1747405854

> That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.

oconnor663 · 2025-05-17T18:35:14 1747506914

Sure but of those 99%, how many are performance-sensitive, CPU-bound (in Python not in C) applications? It's clearly some, not saying it's an easy tradeoff, but I assume the large majority of Python programs out there won't notice a slowdown.

rfoo · 2025-05-16T15:50:56 1747410656

That's the mindset that leads to the funny result that `uv pip` is like 10x faster than `pip`.

Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!

Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.

Doxin · 2025-05-17T14:32:32 1747492352

For anyone who hasn't used uv, I feel like 10x faster is an understatement. For cases where packages are already downloaded it's basically instant for any use case I have run into.

notpushkin · 2025-05-16T18:23:37 1747419817

> Yet, there's a pip maintainer who actively sabotages free-threading work.

Wow. Could you elaborate?

foresto · 2025-05-16T17:22:38 1747416158

As I recall, CPython has also been getting speed-ups lately, which ought to make up for the minor single-threaded performance loss introduced by free threading. With that in mind, the recent changes seem like an overall win to me.

celeritascelery · 2025-05-17T01:06:21 1747443981

It’s not either/or. The CPython speedups would be even better with the single threaded interpreter.

foresto · 2025-05-17T08:58:49 1747472329

Nobody has suggested otherwise.

pphysch · 2025-05-16T14:56:10 1747407370

But the bar to parallelizing code gets much lower, in theory. Your serial code got 5% slower but has a direct path to being 50% faster.

And if there's a good free-threaded HTTP server implementation, the RPS of "Python code as a whole" could increase dramatically.

fjasdfas · 2025-05-16T16:16:42 1747412202

You can do multiple processes with SO_REUSEPORT.

free-threaded makes sense if you need shared state.

pphysch · 2025-05-16T20:58:10 1747429090

Any webserver that wants to cache and reuse content cares about shared state, but usually has to outsource that to a shared in-memory database because the language can't support it.

monkeyelite · 2025-05-17T03:40:37 1747453237

And most web servers already need in memory databases for other things. And it’s a great design principle - use sharp focused tools.

weakfish · 2025-05-16T15:45:37 1747410337

Is there any news from FastAPI folks and/or Gunicorn on their support?

rocqua · 2025-05-16T15:20:13 1747408813

Note that there is an entire order of magnitude range for a 'single digit'.

A 1% slowdown seems totally fine. A 9% slowdown is pretty bad.

smilliken · 2025-05-17T15:59:34 1747497574

I've seen benchmarks that estimate the regression at 20-30%, though I expect there's large variance depending on what a program's bottleneck is.

monkeyelite · 2025-05-17T03:40:55 1747453255

If so, then why use python?

jacob019 · 2025-05-16T12:02:55 1747396975

Your understanding is correct. You can use all the cores but it's much slower per thread and existing libraries may need to be reworked. I tried it with PyTorch, it used 10x more CPU to do half the work. I expect these issues to improve, still great to see after 20 years wishing for it.

btilly · 2025-05-16T17:22:47 1747416167

It makes race conditions easier to hit, and that will require multi-threaded Python to be written with more care to achieve the same level of reliability.

monkeyelite · 2025-05-17T03:38:37 1747453117

Yes it makes every part of the ecosystem more complex and prone to bugs in hopes of getting more performance in a scripting language.