Retry Loop Retry

28 points by sinclairtarget


joshka

Something I recall learning a while back is that the best number of retries is often 1, which makes this not a retry loop but instead an if statement. This was coming from people that really knew exponential backoff with jitter and token bucket approaches well and could explain how these affected extremely large scale systems.

Why? Because failing a second time is highly correlated with failing a third time in many distributed systems. A good mental model for this is that errors is generally either:

You shouldn’t just trust me on this, instead go look at some logs of your own systems where you have retries. Hopefully you’re logging the errors with a retry count and you can compare counts of the first retry and the second retry.