Super-flat ASTs
54 points by asb
54 points by asb
Nice post! I am vaguely aware of the flat AST idea from the Cornell blog post and from various talks of Carbon and Zig. But storing children contiguously is something I've never thought about. Definitely something to explore in my future language projects.
This is close to how we typically store a BVH (Bounding Volume Hierarchies, a kind of binary search tree to accelerate spatial queries) node in graphics:
struct BVHNode {
AABB aabb; // bound of the node
// either refers to the index of the first child or the first primitive, assume they are contiguous
u32 first_child_or_primitive = 0;
// If primitive count is 0, we have an inner node
u32 primitive_count = 0;
};
I think I first seen this layout in How to build a BVH – Part 1: Basics, but PBRT also uses something similar
For a good example of this sort of pattern in the real world, take a look at the Zig compiler source code. I'm sure others might do it but Zig definitely does. I have a now very outdated series on some of the Zig internals: https://mitchellh.com/zig/parser And Andrew's old DoD talk is very good and relevant to this: https://vimeo.com/649009599
More generally, I believe its fair to call this a form of handle-based designs: https://en.wikipedia.org/wiki/Handle_(computing) Which are EXTREMELY useful for a variety of reasons and imo woefully underused above the lowest system level.
Lovely work, storing children contiguously is indeed a nice touch!
If the author's in the mood for a follow-up I'd love benchmarks for traversals/transformations as well as just construction. ...Ooh, the source is on github, I might have just nerd-sniped myself...
Flat trees are interesting, but if the tree needs to support incremental changes is that still possible? It seems like they only way to change a bit of the tree then it's to make a whole new tree.
I’m considering working on something like this for Typst. It would be a hybrid-flat CST that allows marking ranges as being replaced and containing a pointer to a new allocation. If we don’t use the incrementality it’s pure win, and if we do it would only cost some extra memory and an extra indirection that’s no worse than before. Kind of like a rope or other string editing data format, but with internal structure.
My one worry would be holding a lot of unused memory over a long editing session, but Typst is already pretty memory hungry so it would require large edits to make a difference. I’m also sure there would be ways to consolidate, such as if a new allocation is smaller than the section it’s replacing.
I guess I'm saying that you shouldn't look at arena allocation as a pure win in the context of syntax trees. It has definite costs, not least of which is that the arena itself needs to be mutable. That means you can't give out direct references to the nodes: everything must pass through a layer of indirection first.
By contrast a tree of heap nodes can be deeply immutable, and with direct references trees can be easily composed in the functional style with the GC handling cleanup. And that deep immutability brings with it a whole raft of other great benefits, like free undo history that you can have just by holding onto several of the past tree roots.
It's so weird that I was thinking about this problem just a few days ago and independently came up with the idea of "flat ASTs" for a parsing problem I had.
I was thinking implementing it might be overkill (vs regular pointer based tree) but this post validated to me it's a good idea so thanks for that!
On the other hand, I'm not sure if I'd have ever come up with "super-flat": the complexity jump is massive compared to the other optimizations which all felt relatively intuitive.
fn fibonacci(n) {
if n < 2 { n }
else { fib(n-1) + fib(n-2) }
}
let result = fibonacci(30);
Am I missing something or is there a fib() builtin?
Probably a typo. That kind of things happen a lot when we write snippets without running. My last blog post also had some of those.