Why Queues Don’t Fix Overload (And What To Do Instead)
5 points by polywolf
5 points by polywolf
If the system is at capacity, it must reject new work immediately. It must look the sender in the eye and say, “I am full. Go away.” The sender must be told instantly so it can make a policy decision: should I drop this request, retry it later, or show the user a degraded experience? This is not a failure of the system. This is the system successfully defending itself with proper feedback loops.
"Retry it later" is a queue. If you're lucky, it's your sender's queue; but, sometimes, you're not lucky.
One classic post on this subject is Queues Don't Fix Overload.
Unbounded queues are common primitive. The Erlang ecosystem uses them, many work queuing systems use them, and a lot of systems form implicit queues.
The classic response is for each layer of the system to start rejecting new work once it's overloaded. This can then propagate backwards through multiple system layers. Ideally, it will slow down (or at least reject) the original producers.
Does anyone use queues in such naïve way? The queue is a tool which lets you do deduplication of tasks, or re-population of cache when your data changes, not when the users come to find dead stale cache.
The way to serve data under pressure is to serve it from cache. And the queue is an invaluable helper here.