jemalloc Postmortem
157 points by je
157 points by je
The orange site post has some nice discussions from a former Facebook Jemalloc team engineer https://news.ycombinator.com/item?id=44265520
Kind of nuts that he worked on Jemalloc for over a decade while having personal preference for garbage collection. I’m surprised he doesn’t have more regret.
I don’t really see the conflict? I personally prefer garbage collection for almost everything, but if someone were to pay me to write C, working on a memory allocator is probably one of the more fun jobs. (not that I’d be good at many parts of it, but it would be fun)
And I still use manual allocation on occasion, when I write C or C++
Performance is also a bit like security – it is arbitrarily deep, it allows you to be arbitrarily creative. Having the fleet-wide data mentioned in the article helps a lot
And allocators will also be around forever, especially now that there are safe languages with manual allocation like Rust.
This. It’s not like they’re mutually exclusive or unrelated technologies. Why should he regret making something excellent?
(I have to throw a bit of shade here and say I wouldn’t work for Meta, but I’m talking more about the general principle … and fleet-wide data is fun, but it’s also associated with architectures I don’t like)
A lot of us end up working on lower level things than we want to, in order to bring the higher level things we dream about to life. 🤷♂️
Any decade now, I’ll stop working on ISA design and start building the operating systems I wanted to enable the end-user programming systems I want to be able to build the user interfaces that I want.
I’ve idly entertained such thoughts over the years. I mean, how long could it take to implement everything from logic gates on up in ternary logic (for that theoretical 10% efficiency gain - maybe), including memory and IO. And then write an OS and all associated tools in a programming language that doesn’t exist outside of my head yet, and then the entirety of userspace apps needed for it, plus all the conversion back to binary logic because there’s no way the rest of the world isn’t compatible with it.
Even if I had the focus for all that, I’ll still need someone to invent some immortality treatment…
And some of us end up working on higher level things than we want to, in order to pay the bills. Life is a cruel misstress…
Yes, I had the same thought when I wrote this. Careers often get steered by the need to pay the bills, not what we want to be doing. There are a lot of people in our industry slinging Javascript who’d rather be doing just about anything else.
It’s not really conflicting, especially since his allocator work started with developing an allocator to back a GC. Implementing a GC is implementing memory management, even if you prefer working in a language with a GC you’re using an allocator.
And a naive mark-and-sweep GC like Matz’s original CRuby can just use a standard allocator. Certainly good enough for some applications, especially if the allocator is as good as jemalloc.
Working on garbage collectors also requires working on an unsafe heap, only they’re also less generally useful as they’re restricted to languages that support that specific model of GC, or constrained allocation environments[1].
So a general purpose allocator is more widely useful - e.g can exist as a standalone library - and also can be more easily updated and changed over time, as the interface is much smaller and inherently exposes less internal implementation details than a GC allocator.
Basically: standalone GC allocators are usually insufficiently general to really make it as separate long lived libraries, so basic selection means if someone has spent a decade working on a widely adopted allocation library it will not be a GC allocator - even if it’s used in runtimes for GC supported languages.
[1] errr by constrained I mean “constrained api” not available allocation space. Essentially “all allocations go through a small set of controlled interfaces”
@je really interesting read. Will you keep using jemalloc in future endevours or would you consider/recommend something else with more open community involvement or regular maintenance?
I’m just generally curious if you’ve looked at other allocators or have thoughts about them. I don’t do much manual memory management programming but I remember reading about WebKits use of libpas which was quite interesting.
That’s an interesting question! For the time being jemalloc is still my obvious preferred choice, but within two to three years I won’t be surprised if other options become better overall choices. There are two major factors pulling in opposite directions. On the one hand, jemalloc remains unusual in that it minimizes combinatorial memory layout complexity, which is critical to minimizing awful corner case behavior. (I hope it doesn’t strain credulity too far for me to claim that jemalloc is conceptually simple.) On the other hand, the optimization landscape for modern computing is ever shifting. In fact, I prefer to think of constantly undulating ocean waves. Without ongoing attention, jemalloc will stay in one place rather than riding wave tops. And over a longer period of time, jemalloc will slowly sink below the surface as APIs change. ;-)
On the other hand, the optimization landscape for modern computing is ever shifting. In fact, I prefer to think of constantly undulating ocean waves.
What changes do you foresee that will affect jemalloc’s usefulness?
jemalloc remains unusual in that it minimizes combinatorial memory layout complexity, which is critical to minimizing awful corner case behavior.
If you don’t mind, could you elaborate a bit on this memory layout simplicity? I’ve read your 2006 paper but I doubt it describes jemalloc very accurately nowadays :)
Back in 2010, I was tasked with putting an HTML5 compliant web browser on a WinCE 5.0 tablet. iPad didn’t exist back then, this tablet had 32 mb (yes megabytes) of memory. I managed to compile Qt for WinCE 5.0 thanks to a lot of outside contributors, and we had QtWebKit available.
Only problem? DLLs themselves were ~28mb, and we had around 4mb for the system and any application we can write. Whatever we tried the internal browser would crash. Then I saw someone got jemalloc to work on WinCE. Ported that to our system, overrode CRT malloc/new/free at build time and voila, we could render google.com on that poor “tablet”.
I had a lot of fun and learnt a lot doing this. Thanks for jemalloc! It’s one of the greatest libraries I’ve used in my professional life.