Applying "Programming Without Pointers" to an mbox indexer using Zig
24 points by deevus
24 points by deevus
A note to all people writing their own mbox code: the format is not that simple. You have to know in advance what kind of mbox subformat you are dealing with (mboxo, mboxrd, mboxcl, mboxcl2). Otherwise you have to use heuristics on a best effort basis.
All formats except for mboxcl2 escape lines in the message body starting with "From " as ">From ". Some of the formats rely on scanning for the next "From " line to find the next message, some don't (they use a Content-Length header). Some times the mbox file is a mix of those conventions. Fun!
A good primer is here: https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml
Author here. I’ll certainly have to dig deeper. To be fair (on me) I had a specific mbox format that I wanted to parse. So my tool looks for the new lines followed by ‘From ‘
The article opens by claiming PWP (programming without pointers) is about avoiding allocations, but you can use pointers without heap allocating memory. You can use pointers into stack allocations or you can preallocate an array once and use pointers into the array.
Also, heap allocations aren’t terrible in all GC languages. Many Java GCs make allocations very cheap—almost the same cost as a stack allocation. Or at least that’s my understanding anyway!
Not trying to nerd snipe here—I think this stuff is interesting, and I agree with the author’s enthusiasm about avoiding unnecessary allocations.
The more things change, the more they stay the same — or so I expect someone older and wiser than I could say.
As Andrew Kelley mentions at the end of the referenced talk, this style of programming brings garbage collection back into the picture, and it of course must be implemented manually and more or less ad-hoc. Choices appear; do you add mark bits into the data or allocate space for them when you start sweeping, do you compact your data arrays or employ free lists to reuse memory, do you use singular arrays for storage or chunked storage to avoid large reallocs? As the number of objects created and discarded grows, the demands on the GC grow and it becomes less adequate at dealing with the task at hand. At some point it starts to bug you that you can't just deallocate data when you know it's become unnecessary. We are back at the beginning.
And of course allocators are just a form of garbage collection internally as well. It's all just a circle.
I liked and upvoted your comment and I also have a response:
Choices appear; do you add mark bits into the data or allocate space for them when you start sweeping, do you compact your data arrays or employ free lists to reuse memory, do you use singular arrays for storage or chunked storage to avoid large reallocs?
I have an answer to these questions! Garbage collection is infrequently needed in many use cases including the OP, my personal use cases for this programming style, and as another example the way git internal storage works. When I say "infrequently needed" I mean across thousands of invocations of the application process, maybe you run garbage collection once. How often do you run git gc, or see git automatically trigger gc?
Under these conditions, a handy set of tradeoffs appear as an answer to your questions. Of course, nothing is free, there are always pros and cons, but the trick is that by taking into account the specific characteristics of a given problem, there will be a sweet spot for those tradeoffs.
Concretely:
This optimizes for the hot path: not running gc.
A nice way to think about this is whether GC is in the "main loop" of the application. For example if you might need to run GC during a game frame update, then you probably want to optimize for GC performance. If you need to run GC during a web server request handler, or pause serving all requests during GC, then you probably want to optimize for GC performance.
I like your way of phrasing it though, it is all just a circle. Once you dig into this stuff you can think of manual memory management as a degenerate GC, and vice versa.
That has indeed been my response as well, at least for now. In Nova JS engine I use singular arrays or SoAs for storage, realloc to grow, never shrink (at least at present), and GC performs compaction with mark-bit memory allocated for tracing on demand. This is of course frankly insane but well... We'll see what it comes out to: at least Nova is not ridiculously slow, only perhaps 10x slower than an optimised interpreter :D The idea is that GC throughput is being optimised by using less memory and making the GC sweep process itself embarrassingly parallel.
At my day job an occasionally mutated data flow graph is similarly stored in arrays, and when the graph needs to be reordered it is done in-place by copying data around in the arrays: it takes time, but it's not a common operation so its performance isn't very critical. But other parts around the graph, like per-node metadata that is only conditionally present, is individually allocated (ie. they're objects since this is in JavaScript) for convenience since their performance is mostly irrelevant.
This is true and insightful, but also there are plenty of cases where preallocation without garbage collection is perfectly sufficient and yields a performance boost over GC. Maybe we should say that generalizing preallocation to every scenario becomes GC, but there are plenty of scenarios where preallocation alone is sufficient and desirable? Also, the same program can benefit from many different memory management schemes (including different kinds of GC) in different places.
Oh absolutely, I did not intend to imply that PWP is bad, or anything of the sort: I am a data-oriented design zealot if anything! If I had anything insightful that I wanted to put across, it is simply that one size does not fit all, and as the complexity of a program grows you will indeed need different memory management schemes as you well put it. In many places preallocation is sufficient and should be reached for as it is much faster and simpler. In other places pure preallocation isn't sufficient, but a simple GC on top is so trivial that it's still much better than manual memory management. And in some cases you will find need to dip into either a GC language (perhaps through embedding a scripting language VM) or manual memory management through individual allocations.