Default musl allocator considered harmful (to performance)
38 points by runxiyu
38 points by runxiyu
The author lost me when they benchmarked on a 48-core server. These days, I’m interested in scaling down, with more small and fast software running on more individuals’ computers.
I’m generally reluctant to say that systemic problems are individual skill issues. So instead, I think Rust should do more to support and encourage alternative allocation strategies that don’t suffer from the general-purpose allocator bottleneck. I guess that means a combination of arenas and simply requiring fewer allocations.
The issue for the Rust allocator_api
is almost ten years old and it’s still unstable. If I cared, I would just go for Zig.
I agree it’s sad this area receives so little attention in Rust, but the current allocator_api
impl has issues. For example, using it with arenas or bump allocators like OP suggested is not optimal (as Box carries around useless arena pointer.)
My point is that, given the fact that issue has been sitting there for ten years, allocator_api
is unlikely to ever stabilize, and therefore, while anyone who glaces at the docs of, e.g., Vec, will be like “oh look, it’s there, you can pass a custom allocator,” no one will actually adopt that since the feature is unstable. I think Rust should give up and embrace having a global allocator, instead of dragging feet for decades.
I think Rust should give up and embrace having a global allocator, instead of dragging feet for decades.
Nope, I use that at work. Issue needs a champion to stabilize.
I’m generally reluctant to say that systemic problems are individual skill issues.
Me too, but I’m confused why that’s relevant here - surely using a better default allocator is a systemic fix?
edit: oops, sorry, I hadn’t read down to the Zig part of the original article yet. My bad!
interesting, i could try this new allocator on my toy server
Turns out it really depends on your workload
https://github.com/rust-lang/rust-analyzer/issues/1441#issuecomment-509506279
jemalloc v mimalloc depends on workloads.
musl v anyone else does not, musl malloc is slow when single threaded and horrendous when multithreading is involved.
Of course that’s unless by “workload” you mean not invoking the allocator at all.
Of course that’s unless by “workload” you mean not invoking the allocator at all.
Or you just spend so little time allocating and deallocating memory compared to everything else that it’s neglible.
Which is quite common for well-written high-performance code; you allocate large blocks rather than small objects and you re-use memory where you can, so actual calls to the allocator becomes infrequent.
So it’s true that whether you’ll see any benefits from switching from musl’s allocator to jemalloc or mimalloc or anything else is heavily workload-dependent, also in practice.
I ran into performance issues using musl in the Alpine-based Docker image of Nosey Parker, a secrets detector written in Rust designed for offensive security engagements. Nosey Parker runs in parallel by default, and when running with a musl-based build, throughput was an order of magnitude reduced. Switching to mimalloc fixed that.
I recommend swapping the allocator even if musl is not a compilation target today or if the program is single threaded. This is something you simply don’t want to forget.
Or you could just use the default system allocator until you have metrics that warrant a custom allocator.
If you don’t have a demonstrable allocator performance problem there is no reason to replace your allocator: these high performance allocators generally achieve the performance at the cost of higher memory usage, and at the same time come at the interfere with allocations in the rest of the OS.
Preemptively dropping security and increasing memory usage without supporting evidence is a bizarre thing to blindly recommend.
I’d be careful with blanket using mimalloc, it caused me no ends of crashes for instance: