fast-servers: an interesting pattern
13 points by lorddimwit
13 points by lorddimwit
I’m worried I’ve misunderstood something, because this doesn’t make sense to me.
As I understand it, the idea is to implement the IO state machines using a thread for each state, and implement state transitions (eg from request to response) by passing an fd between threads. Threads are pinned to CPUs.
I can see a number of problems:
What if the number of states in the state machine doesn’t match the number of cores? I guess if you have a simple state machine and lots of cores you can run multiple copies of a state’s thread. But what if you have a complicated or dynamic state machine?
What if there’s an imbalance between states? e.g. reading a request requires less work than writing a response. Won’t some cores be overloaded and some starved?
It’s unfriendly to CPU caches: a CPU that has a request fresh in its cache discards that work, handing off to another CPU which has to re-load the request into its own cache.
The kernel and network cards are designed to be able to map network connections to CPUs efficiently, with hardware assistance. This thread hand-off design wrecks any kernel+hardware socket/CPU affinity.
For a really high-end example of the latter, see Drew Gallatin’s 2021 presentation on the Netflix Open Connect Appliance in which he talks about balancing socket affinity to avoid oversaturating the server’s internal interconnect bandwidth.
It seems to me that this design is over-fitted to servers that have very simple request/response state machines, and it prioritizes simplicity of the IO dispatch loop rather than whole-system performance.
It’s been ten years since first submission, and there were no comments. Lobsters has grown a lot, might be a good opportunity to get more comments now.
Is there any server implementing this pattern, that we know of?
Nothing public to my knowledge. I've played around with it some, and it's relatively pleasant to use.