Against Query Based Compilers
22 points by LesleyLai
22 points by LesleyLai
Reading the recent discussion about teaching compilers, especially ~pervognsen’s observations about laziness, I thought about how hard it is to parallelize a lazy evaluation runtime system, and how the plan for parallelizing the Golang port of the Typescript compiler relies on hoping that compiler threads don’t waste too much time repeating work. It’s an evidence-based hope, so they know multiple cores will actually compile the typescript compiler faster. But it made me wonder what a compiler would look like if it could be designed from scratch for parallel compilation.
Claiming that incremental out-of-order hashing and encryption are impossible (when they empirically exists, and avalanching can simply be added with constant-time finalization) unfortunately weakens the post very early on. The thesis is independent of that claim, so it's a bit unfortunate that the author gave it such a prominent location in their post.
I don't know of any cryptographic hash functions that are parallelizable in any way that can meaningfully improve runtime. Universal hash functions you can turn into a MAC, sure.
Same for encryption, you can't meaningfully parallelize a Feistel network, even though you can then apply good ol' counter mode.
It was a minor point, and while you're strictly right, I think it's only because the statement wasn't qualified enough, so kinda minor, and didn't ruin it for me.
Plus, it's a good and important opinion piece (i.e., I strongly agree w/ the thesis :)
I don't know of any cryptographic hash functions that are parallelizable in any way that can meaningfully improve runtime. Universal hash functions you can turn into a MAC, sure.
BLAKE3 or any other collision-resistant hash function based on a block compression tree. This also lets you incrementally update hashes for fixed-length substring differentials in O(log(original) + delta) time by memoizing the compression tree.
Right, that's why I qualified with "meaningfully", even though, sure it was a poor qualifier, because BLAKE3 is definitely solid. Obviously merkle trees get you down from linear to log(n) but in practice those aren't so different, and certainly not parallelism that scales to the # of processors you can distribute the data to.
are there meant to be edges in the graphs? i don't see any @matklad
I see edges (black arrows between nodes) in all three SVGs, in both Chrome 116 and Firefox 115 on macOS. Do you have custom styles that force a black background on the page, perhaps?
I have the same problem (Firefox on KDE with dark mode). I think the issue is that the SVG element styles use 'light' and 'dark' colors whereas the page style isn't adaptive, so if you're viewing it in dark mode you're getting white edges which are effectively invisible against the page's white background. I'm not a web person, though.
that seems to be it, yes - when the browser is in dark mode the edges are white but the page itself is still white
@roryokane ironically i think forcing a black background would "fix" this
I’m not sure I understand what’s the alternative to queries that the author is describing. Is this a praise of the good old linear architecture? Sure it is easier to implement and may be faster per se, but then how do you handle incremental changes to the input for LSP-like functionality?
This seems to be more about language than compiler design. I feel like a more appropriate title would have been something like "Make your language able to compile files independently from each other".
I'm not sure what this blog post is saying exactly.. It seems to be saying: the dependency graph of things will be different in different languages and in different queries (obviously), and ..., so we should do good old pipelines but try to parallelize.
That "..." part seems to be missing. (or maybe I missed it?)