The Performance Revolution in JavaScript Tooling
10 points by ayo
10 points by ayo
Same thing is happening in Python (Ruff/UV). A high level language that's also performant, runs an all platforms, and is easy to build and upload to all package managers really revolutionized so many ecosystems.
JS will come roaring back into this arena with a vengeance because the idea that JS is "just too slow" is not at all supported by reality.
The whole JS tooling community settled on patterns with exceptionally poor perf: mutable trees in which every node is type-polymorphic. You couldn't get any further from what makes for fast JS code.
But Anders knows all that just as well as I do, and you heard what he said: JS and rust were ruled out for the same pithy reason: it would be too hard to clean up the highly cyclic data structures. In other words "why even try the best option because it is difficult." Since when am I supposed to respect that?
No offense, but I’ve been hearing people make claims like this since the mid-90s; back then it was “in a few years Java will run faster than C!”
ASTs are kind of intrinsic to a lot of parsing / transpiling / optimization work. They’re type-polymorphic. You can make them immutable, but at the cost of a lot of copying, I.e. creating more garbage.
I don’t see why a tool for operating on language X needs to be written in language X. Use the best tool for the job.
While I do agree that there is no set-in-stone rule that tools of X must be written in X, I do think the dogfooding aspects are very important. As such, I do think that a reaction to native tools will eventually come; whether or not that reaction will have a meaningful impact is of course another thing entirely. (Actually: musing on this, I could imagine the reaction being tools written in JS and running in Porffor or other AOT-compiling JS engine, thus achieving the holy grail of native speed with in-language [but non-idiomatic] development.)
Regarding performance of ASTs specifically and writing them in JavaScript (and other languages), while they are intrinsic it does not necessarily mean that the most idiomatic form of writing an AST is the best possible thing. At work I'm currently working on refactoring a data-flow graph from an object-based format into plain data in TypedArrays; the graph is effectively just a step away from an AST, and I have already done the work on refactoring an intermediate storage format of the plain AST into the TypedArray format. Creating this intermediate storage format isn't on a hot path so I don't have performance numbers to report on it, but memory usage wise I achieved a 10x reduction. Hopefully in a week or two I'll have performance numbers to report, but I expect them to be staggering, as the current object-based format's performance is done dirty by using a single hash map to access nodes by name instead of using direct node object references.
Anyway, the TypedArray based format is very much inspired by the Zig compiler's "encoding approach" (see Andrew Kelley's talk "Practical data-oriented design"), meaning that a single node can be as small as 4 bytes in extreme cases. Perhaps JS tooling written with such closely guarded memory usage and layout can indeed come to rival native tools?
Stuffing data structures into byte-arrays is a viable path, and I’ve seen it done in Go also, but IMO it's going against the grain of the language — it’s basically resorting to writing assembly code in JS/Go/etc. You’re doing your own memory allocation, your own structure marshaling… To me that's a smell that you’re using the wrong language.
You can do this kind of thing much more cleanly in C or C++ because those languages let you mess with pointers and hide the details in an abstraction. So in C++ it’s easy to write an arena allocator that works this way but acts from the outside just like the normal new.
I definitely agree: this is the kind of code that makes coworkers weep and your future self curse you. I wouldn't quite put it at the level of assembly as it's closer to using arena allocation, but I assume you mean that more figuratively and you're absolutely right.
But... I do think there is still some value in doing this despite it going against the grain. For TypeScript specifically, I assume that TS originally probably didn't have support for almost-nominal-typing in the form of branded primitive types, but those were found to be useful and saw heavy use in the TS compiler itself. Perhaps an alpha-version of TS also didn't have definable Array value types but instead made all Arrays equivalent to any[]? TypedArrays in TS today are similar to this hypothetical pre-alpha Array type that use number instead of any for the value type. If TS did support defining the value types (and perhaps even key types!) of TypedArrays, then you'd quickly find that strongly-typed numeric data becomes possible and (if key types become supported) then strongly-typed numeric pointers also become fairly reasonable. Effectively, what is or isn't going against the grain of the language can change.
So, if enough projects pick up TypedArrays as a means of doing their own memory allocation, then the ecosystem will adapt to support that feature and it will get strong-enough support to be realistically used. For TS TypedArrays specifically, I think we're moving in this direction as we keep seeing extensions of them year after year. Resizable ArrayBuffers did not use to be a thing but are now present in all browsers, and read-only ArrayBuffers are probably one of the next things to appear. JavaScript is not standing still, which means that TypeScript cannot stand still either.
I don't think it needs to be all that much copying. In a well-made immutable structure an edit only requires you to replace the nodes on the path from the root to the edit point. I could easily have a tree of hundreds of thousands of nodes and only be building 10 or 20 new ones on each edit.
They also don't need to be type-polymorphic. I agree that every implementation so far has chosen type-polymorphism for its nodes but it's just a common pattern more than an unbreakable rule
I think the question is not: "Can you write more performant code in Javascript?" but is "Is it more convenient to write code with performance constraints in JavaScript, Go, or Rust?"
Reminds me of “speed without wizardry”, the (now old lore) sourcemap parser event, where Mozilla rewrote their source map parsing from js to wasm rust, got 6x gains, mraleph took it as a challenge and got the js version to match the rewrite through algorithmic improvements and low level runtime profiling, and Mozilla integrated the algorithmic improvements in their rewrite for an additional 3x gain (from the original not the previous version of the rewrite).
I do think you're correct in that a reaction will come; JavaScript is "too big to fail" in a sense, eventually someone somewhere will create a new project or breathe new life into an old one, generating enough buzz to make waves. Whether that will change the currently inevitable-seeming march of native tools replacing in-language ones remains to be seen.
I'm being a bit coy. The reaction is going to come from me. I'm five years of full-time work into building the software that demonstrates in the most practical possible sense my complaint with what they've done (by showing how much better it could have been)
Is that work open source?
All of it. MIT. Github org bablr-lang, until we have our own platform for code hosting at least