A Survey of Dynamic Array Structures
32 points by gingerBill
32 points by gingerBill
Good post. A couple of comments:
"Runtime fixed-size array" may sound unpractical, but actually, I find in a lot of cases, one doesn't need to change a dynamic array once it is populated. I like that Rust provides the .into_boxed_slice() function for Vec.
"realloc-style double-and-copy array" indeed works together poorly with arena. In one project, I tried to combine it with arena. I used a temporary arena allocator to populate the dynamic array, then copied the result to a permanent arena once I knew its size. This is pretty cumbersome and not ideal.
The tree way is commonly used in functional languages for persistent data structures (see relaxed radix balanced trees)
This is the best summary article on dynamic arrays I’ve seen that isn’t a survey paper. My personal favorites in practice are the “arena-backed array” and the “exponential array”. The only notable omission I can find is “space-optimal resizable arrays”, described informally here:
http://www.drdobbs.com/fast-and-small-resizable-arrays/184404698
and in the paper "Resizable Arrays in Optimal Time and Space":
https://cs.uwaterloo.ca/~imunro/cs840/ResizableArrays.pdf
(The key idea is that we can approximate sqrt(N) for both directory length and largest segment length, using only powers of 2, if we alternate doubling directory length and doubling segment length. This saturates the Omega(sqrt(N)) worst-case lower bound for wasted space in dynamic arrays proved in the paper above.)
I had never heard the word before, so here's a definition from somewhere else on the internet[0]:
An arena is just a large, contiguous piece of memory that you allocate once and then use to manage memory manually by handing out parts of that memory.
I got through about a quarter of the article assuming it meant all the memory available to a program.
gingerBill had a good blog post about it. Other names commonly used for the same concept include "linear allocator," "bump allocator," and "region-based allocator."
Confusingly, I've also occasionally seen people using the word "arena" to refer to different things, like memory pools
Speaking of trees, there are also finger trees, allowing the navigation to be logarithmic either in total size, or in offset with respect to a saved position.
Of course, for the purposes of the original author, this doesn't change «pretty good at everthing, great at nothing», and even more complex…
The tree approach reminds me of ropes, with which you can also implement dynamic arrays: ropes are for strings - array of chars, and it is only natural that one can use rope-like structures for arrays of anything. Unlike the binary tree in the article (storing two elements per node), ropes store varied numbers of items per nodes and should have lower metadata/bookkeeping costs (amortized). I believe we can also impose an exponential allocation pattern to provide "amortized O(1) growth". Rust has an any-rope crate, and I'm inclined to think it's already doable with that crate.
BTW, I think the exponential array approach can benefit from the small vector optimization, which can be an interesting exploration.
Meta: this website has an interesting design: without JavaScript I can only see the images and code snippets, but not the plain text. (I'm used to it being vice versa).
Ooh, I've never heard of the "exponential array" one before. I'll have to find an excuse to play with it someday.
Great post, so here's a minor nitpick: "Trees: not great at anything", I would say "fast append/concat", and maybe give an honorary mention to ropes.
A ziptree/treap can also be made history independent too, but that's probably not that important.