Add Virtual Threads to Python
14 points by a5rocks
14 points by a5rocks
Virtual threads, fibers, and all the similar versions always run into problems where tasks are blocked on other virtual threads making progress but because they aren’t actually backed by real threads get functionally deadlocked.
There are ways to make it harder for that to happen but that means now you have something that is even more complex than just locking.
The whole purpose of the async/await and actor style models is to make the myriad difficulties of threads as close to impossible as possible at compile time.
People who say “threads are easier” are either doing things that are extremely uncomplicated and could trivially be done as async/await or their code has errors in it.
Virtual threads, fibers, and all the similar versions always run into problems where tasks are blocked on other virtual threads making progress but because they aren’t actually backed by real threads get functionally deadlocked.
Yes, virtual threads can deadlock each other, but OS threads can as well. A deadlock is a deadlock. I don’t see how OS threads are making them less likely.
actual threads block on conflicting locks, virtual threads are block by one virtual thread waiting on another virtual thread to be scheduled on the waiting thread.
GCD on macOS also attempted this, and hit the same problem: multiple scheduled tasks but all blocked on completion of a task that is not being run due to the thread pool being saturated. Nowadays if you ask GCD to fire off a pile of tasks it actually just spawns threads.
The “advantage” of virtual threads is extremely limited, and the hazards are even worse the full threads.
Unless you have hundreds, if not thousands of threads, with well defined and sensibly controlled behavior you will hit more problems using virtual threads than just using threads.
That’s not usable as a general language tool.
If I recall correctly the original Java thread model was “virtual threads” but it turns out having that as the default model of “threads” breaks for any code expecting/requiring real concurrency.
Meanwhile async/await or the actor model does away with large swathes of threading related code failures. The new syntax is not particularly heavy weight - though can be frustrating in the cases where you do have actual “my plan is to pin every core to 100% for the next ten minutes” (which isn’t something that should involve python) - but for the majority of use cases these models are far better, and don’t have the virtual thread/threadpool models of pseudo threads. (not using “pseudo” in a derogatory sense, just a generalization of “thread that is not backed by an actual OS thread”)
I don’t think the GCD example is relevant here: GCD, as far as I know, does not use virtual threads. As you said, each task is allocated to an OS thread taken from the thread pool, and the task will use 100% of the OS thread until completion. Virtual threads, such as Go’s goroutines, work differently. If a goroutine is blocked waiting on a resource blocked by another goroutine, then the OS thread will run the other goroutines that can make progress.
Unless you have hundreds, if not thousands of threads, with well defined and sensibly controlled behavior you will hit more problems using virtual threads than just using threads.
People regularly run programs with millions of concurrent goroutines.
If I recall correctly the original Java thread model was “virtual threads” but it turns out having that as the default model of “threads” breaks for any code expecting/requiring real concurrency.
Virtual threads with cooperative scheduling work well for concurrency on IO-bounded tasks (for example gevent in Python). But when we need true parallelism on CPU-bounded tasks, then we need virtual threads with preemptive scheduling, such as Go’s goroutines and Erlang’s processes.
Java has virtual threads. Virtual threads are a better way of doing concurrency than Python’s
async
andawait
. We should add virtual threads to Python.
I disagree pretty hard right at the beginning motivation.
Care to elaborate?
Hand-waiving that it’s the “better way” without elaboration is specious. Many more informed persons than I objected in the thread, but I think the OP of that thread just doesn’t like “function coloring” and the need for an async runtime, but then suggests that virtual threads will just magically detect break points and that feels like fantastical thinking.
pip install gevent
in all seriousness though I would like to see this come to fruition in the stdlib