vm.overcommit_memory=2 is always the right setting

45 points by FedericoSchonborn


david_chisnall

I have rarely read an article that I disagreed so strongly with.

Windows doesn’t overcommit. It has a policy of never making promises it can’t keep. And this is how I ended up with a machine with 128 GiB of RAM, 60 GiB free, and memory allocation failing.

Failing locally is nice for debugging but is absolutely not how you design resilient systems. You build resilient systems by ensuring that you can handle failure and that’s much easier at a higher level. Do you handle malloc failure gracefully? At every call site? In every library that you call? If those libraries allocate memory, every single API that may allocate memory now has to be able to report failure. Do you gracefully handle that?

Kernels and embedded systems have to and it is the biggest reason why embedded development is hard. Handling allocation failure is really hard because you can’t allocate memory to clean up. Often, the only thing that complex stateful code can do in case of allocation failure is exit the program. Indeed, a load of classic UNIX software called malloc via a macro that called exit on allocation failure, because handling the failure in the general case is too hard.

Without overcommit, you massively increase the probability that malloc will fail. And resilient system design is all about probabilities because even the underlying hardware may fail. And you can’t handle those errors locally. The Erlang philosophy is the one that lets you build systems with ludicrously high uptime: allow local failure, recover at the system level.

You have two choices:

Option 1: Invest a huge amount of engineering effort in making sure that every call to an allocation function gracefully handles failure and propagates this failure to the caller. Run in a mode where this kind of failure is likely. Waste a load of RAM.

Option 2: Ensure that your programs persist any important data regularly so that they don’t lose data on crash. Do best-effort soft limiting to further reduce the likelihood of failure. Build process monitoring to restart processes that exhaust memory and crash.

For software running on platforms with memory measured in GiBs, I will choose option 2 every time because we have a huge body of experimental evidence that it leads to more reliable systems.

Note: I maintain a platform that chooses option 1 because option 2 is unavailable on hardware where RAM is measured in tens to hundreds of KiBs. We put a lot of effort into API design to propagate allocation failures up and pass heap quotas down. We can make pretty good use of the available memory. But I absolutely would not build software for computers with three or more orders of magnitude more memory like this.