You don't need free lists
34 points by gingerBill
34 points by gingerBill
It's trivial to make a hierarchical bitmap concurrent if you have a single writer (set bits top-down, clear bits bottom-up). For multiple writers, a lock-free stack is much simpler (I recommend avoiding Pop() in favor of PopAll() if possible so you don't need to deal with ABA).
It’s O(1) when log N is a constant. (I’m tired of this type of abuse of big-O notation. It’s okay to have higher big-O if you have an idea of what N will be.)
Bitmaps live in a fragile sweet spot: where N is large enough that a simple linear search for a sentinel value is too slow, but N is small enough that scanning the bitmap isn't too slow. A linked stack scales much better than a bitmap, but I prefer a bitmap when I can afford it for the reason the author cited: it makes lowest-address-first allocation trivial.
It’s not abuse of notation, it’s just nonsensical. Strictly speaking, the mathematical statement “O(log N) = O(1) for constant N” is true. But I’m just trying to be funny here, I fully agree with your sentiment.
One problem with free lists is they’re extremely susceptible to buffer overflow attacks - it’s a large component of why they’re not used is new allocators - essentially you don’t want the metadata that defines the heap to be in the same place as objects in the heap.
The cache benefits of object reuse is something I’ve never been too sure about. You obviously get an immediate win during post-alloc initialization, but if you end up with sparse allocation of related objects (as the post says: functionally random allocation) the longer term performance seems like to suffer. But on high end CPUs the load predictors and such may be able to compensate for that as well.
@marginalia also pointed out the risk of false sharing which I think is a good way to cause problems, but I think the bigger problem is cross core conflict rather than local cache inefficiency - basically if you’re writing to an object in one core and it intersects unrelated objects in a single you can start forcing cross core synchronization - default TSO on x86 hits this, and all atomic ops have this problem (eg a core doesn’t need to synchronize with other cores if no other cores have a view of the relevant bytes)
There are a bunch of hardened allocators that still use freelists. They do things like randomizing the initial freelist population from a slab, making the freelist FIFO rather than LIFO, and authenticating freelist pointers (using lots of different techniques, some more secure than others).
The locality issues of thread-cached freelists are why mimalloc and some other modern allocators don't use them (they use thread-owned heaps instead and never reuse memory allocated from another thread's heap).
For memory constrained devices, I often just have a sentinel value, usually 0 or -1, that i search the array for to find an empty slot. The lists are also small though so bitmap wouldnt have much more overhead. Might make sense for larger structures, but unsure if it does for arrays of small non-zero handles/fds. It would also mean freeing is more than just one atomic assignment.
significantly better slot allocation strategy with improved memory locality
There's some unstated assumption here I'm not following. Wouldn't it be an undesirable property if different allocations were to share cache lines, because of false sharing and so on?
If spatially contiguous allocations are likely to be used only on the same thread then false sharing isn't an issue. OTOH, if spatially contiguous allocations are later re-allocated by different threads that free those allocations, then false sharing can be a big issue--the main reason why modern allocators like mimalloc don't reuse memory allocated by a different thread.