My Thoughts on Bun's Rust Rewrite
31 points by jiacai2050
31 points by jiacai2050
Before we discuss Rewrite Bun in Rust, there's something that needs to be said, because no one is saying it. Bun stands where it does today because of Zig.
Jarred (Bun's creator) is saying this. Quote:
Yes, Zig made Bun possible. Bun went from me coding in a smelly room in Oakland for a year solo to a team & among the most widely used tools in the JavaScript ecosystem, and a lot of that is thanks to Zig.
and
I don’t want to generalize that way - zig is a great language & we owe a lot of Bun’s success to Zig, and other projects like Ghostty and Tigerbeetle don’t have the stability challenges that Bun has had
and other projects like Ghostty and Tigerbeetle don’t have the stability challenges that Bun has had
It's worth asking why; a database typically has significantly larger stability challenges than a language runtime.
Yeah, that's a really weird statement to me. Is he perhaps referring to how Bun needs to retain compatibility with NodeJS? I don't know that that would contribute so much to the kinds of stability issues that Bun has been having, though.
I should have quoted the other part of that tweet as well:
Mixing manual memory management with GC is hard. Drop/destructors and compile-time tools for memory safety make it easier to avoid many classes of bugs Bun runs into
I'm not claiming this is the reason (and note that Jarred doesn't say this is the only reason either). But it is certainly true that interacting with a GC from a non-GC language is tricky.
This article reeks of being LLM-written.
The author's primary language seems to be Chinese (example: https://liujiacai.net/ai/litellm-poisoning-analysis/) so that feeling might be an artifact of machine translation.
@jiacai2050 Do you have the Chinese original? I'd be happy to provide a human translation of your work.
I see this sometimes and it's sad. It seems like people are using these tools in good faith with the belief that it's "just" machine translation. It's not. LLM-based "machine translation" seems to have roughly the same effect as feeding an LLM English text and telling it to "improve" it. The result is a sloppy mess.
The really really sad part is that most people who know English but have a different primary language, write English perfectly fine. But they're the people who are the easiest to trick into believing these language models are "improving" their prose when they're not.
Regardless, I think you have a duty to disclose when you've fed your blog post through a language model, whether the intended outcome was machine translation or "improving" it. When used for machine translation by people who don't know English, it's especially important to provide the source text, because no machine translation is going to be perfect. Disclosing the non-English source text allows for people who know both the source language and English to come in and correct people when something has been made unclear thanks to translation.
LLM-based "machine translation" seems to have roughly the same effect as feeding an LLM English text and telling it to "improve" it.
At least if you're using a general purpose LLM? I wonder if similar things happen to Google Translate; they use LLMs under the hood at least some of the time, but I presume they would train theirs specifically to bias toward literal translations.
Zig's target users are: systems programmers who know what they're doing and are willing to pay the price for ultimate control.
This looks a lot like C or C++ over-confidence, except now in the context of a language that post-dates Rust and Swift. :-/
Edit:
Let's return to Jarred's stated reasons for migration: the Zig codebase had too many use-after-free bugs, double-frees, and memory leaks on error paths.
I guess this is a data point about whether leaning hard enough into having an LLM search for memory-safety bugs is enough to make a memory-unsafe practically safe: an LLM company didn’t choose that route here.
Code written by Claude, reviewed by Claude. This closed loop isn't logically impossible, but it means: no human being has actually read this codebase in its entirety.
But this is obviously not true? If you have a human-written codebase in Zig and mechanically translate it to Rust then anyone who understands the original code will understand the new code. It's not like translating to APL, these are both procedural languages in the C syntactic tradition.
There's an open question of what happens from now on, and there's an argument that if the LLMs are allowed to run unsupervised then the code will drift further and further away from human sensibility, but at the current point in time the codebase is still as comprehensible as any million-line JavaScript runtime everything-in-one-binary? can be.
The more fundamental issue is: AI translates code via local semantic equivalence – it ensures each function behaves identically to the original in isolation, but it doesn't understand the global invariants between functions – those design constraints that aren't written into tests and live only in the original author's head.
This is a weird sentence because it's obviously wrong in a pro-LLM way but also obviously wrong in an anti-LLM way.
LLMs aren't deterministic enough to guarantee that each function behaves identically to the original, for that you need a tool like c2rust. An LLM can translate code that looks like the original, but there's nothing stopping an LLM from translating ((abc & 45) << 3) == 360 into ((abc & 45) << 30) == 360 -- you're reliant on probabilistic detection by the compiler, test suite, and maybe LLM-driven code review.
But if you do have a code translator that can guarantee each translated function is identical to the original (like c2rust) while preserving comments and structure (like an LLM), then what you've got is a perfect translator and you could port million-line codebases automatically. You can think of a compiler as a special case of this, Clang isn't bug-free but it's close enough for people to trust. If LLMs could translate Zig (or C++) to Rust with the reliability of Clang then Chrome would be pure Rust by the end of the month.
Also, encoding invariants between functions is the entire point of a type system. One of the stated reasons to rewrite Bun in Rust is that the type system is more capable of expressing complex invariants. It's not like Anthropic compiled the whole thing to assembly and then burned the source code.
If you have a human-written codebase in Zig and mechanically translate it to Rust then anyone who understands the original code will understand the new code.
In release fast your whole sentence would be optimized out :^)
Bun has been vibecoded long before the Rust port. AI was embraced pretty fast and since the Anthropic acquisition almost every single commit has been authored by a bot. Jarred has also tweeted about adding plenty of features by letting the AI write it over the weekend.
But this is obviously not true? If you have a human-written codebase in Zig and mechanically translate it to Rust then anyone who understands the original code will understand the new code. It's not like translating to APL, these are both procedural languages in the C syntactic tradition.
Aside from the point Kristoff made about Bun not being a human-written codebase in a long time, I think you're assuming that a Claude port will be a mechanical translation when that doesn't seem to be the case. HTML Parsers in Portland analyses a handful of ports of a Python HTML parser and shows that one of the core algorithms is implemented very differently in each one. That is, despite a mechanical port being possible in each case, that didn't actually happen.
In fairness, that was from the start of this year, and the process was done differently, but the handful of snippets of the Bun codebase that I've seen post-port look like they suffer from similar issues.
But this is obviously not true? If you have a human-written codebase in Zig and mechanically translate it to Rust then anyone who understands the original code will understand the new code.
Sure.
LLMs aren't deterministic enough to guarantee that each function behaves identically to the original
Yes. This implies that ‘no human being has actually read this codebase in its entirety’.
There's a fundamental principle in software engineering: code you don't understand should not run in production. Not because it necessarily has bugs, but because when it does bug out, you won't know where to start looking. This principle isn't conservatism – it's the baseline of maintainability.
I've been thinking about this concept a lot lately. I'm pretty far into my career and have deliberately never moved into management. My programming history has mostly been working my way lower and lower down the stack because I like having a deep understanding of the actual program itself. Yes, shipping features and making users' lives better is hugely important too. But one of my greatest internal rewards in programming is feeling like I am learning true facts about a real system.
I think for people like me, it's axiomatic that, yeah, a human should understand every line in a program. But...
My manager is, according to the org chart, responsible for a program I maintain. Because he is an excellent manager and doesn't doesn't micro-manage me, he certainly doesn't understand every line of code about the software his team ships. In fact, I don't know if he's really read any of it. So he is apparently in violation of this principle.
I'm trying to figure out if there's a meaningful difference between:
I'm comfortable with 1, but I feel sketched out by 2. Is the difference just that with 1 there is a human who is accountable for that code? Someone I can blame? Is that a sufficient justification for the strong moral feelings I have around this? I don't know yet.
I worry that my strong feelings here are mostly personal aesthetic preferences and not actually necessary for good engineering. Everyone is entitled to their own preferences, of course, but if that's all it is, then I should probably expect that this aspect of the job will become less pleasurable to me. I'm basically being forced into management, except my reports are bots.
I think for people like me, it's axiomatic that, yeah, a human should understand every line in a program. But...
I've worked on a lot of (very) legacy codebases. In these cases, nobody that wrote the code is anywhere to be found. So it's was an expected skill to be able to pick something up and "re-understand" it.
Then there are cases where I'm git blamed something and gone "woah, that was me" (sometimes in a good way, sometimes bad).
As another triangulation point - There are plenty of libraries I use where I'd struggle to jump in and understand it. And I don't necessarily feel that's an issue.
Equally there are other areas (like CSS), where I probably defer to AI nine times out of ten... and am pretty happy to do so.
All that as fodder, I don't disagree with you directionally :) There is some kind of "line", but I'm not sure where it is. To be honest, I don't think it's a problem for you (not to assume to much!). You'll probably be able to understand the code for some time yet (even if it's straight up annoying and time consuming to do so). I think it's a systemic problem of those coming in to and learning the craft.
One example on "all tests pass": https://github.com/oven-sh/bun/pull/30412/changes/68a34bf8ed550ed69f4a0c18cff5ca9bd41d36ef
The change in that commit is not present in the final diff, as you can easily confirm for yourself. Given that several other people have linked to the same commit in other forums, it seems like you saw that link somewhere, it confirmed your expectations, and you didn't check that it was actually relevant. I think you should hold yourself to a higher standard.
For better or for worse, that commit is a human-authored commit that partially reverts an earlier human-authored commit.
First commit: "await process exit / JSON-parse-retry instead of fixed sleeps"
Revert: "test: revert proc.exited change in spawn.test.ts, keep isDebug iteration count"
Zig enabled a small team to rapidly prototype a high-performance JS runtime without a GC, without a heavy runtime.
All of the other major runtimes (Node and Deno) do the same thing though? They use languages without a GC (C+ and Rust) to wrap a JS engine. (I’m not sure if V8 or JSC actually ship a GC themselves, but that’s beside the point.)
The whole bun repository feels dystopian to me. Bots talking to each other and making absolutely insane PRs. For example https://github.com/oven-sh/bun/issues/30766
The speed at which this was considered "done" is somewhat shocking, but I don't see it being some in motion train wreck the way this article tries to spin without hard evidence.
"When a bizarre concurrency bug appears six months from now, when some boundary condition triggers anomalous behavior under a specific load, the engineer debugging the problem will face a system that no one has ever truly understood." - pure conjecture, no facts have been brought to the table. A language port is the sweet spot for LLMs, and plenty of humans understood the base code. Why would a simple language port suddenly and irreconcilably convolute diagnostics?
I interpreted that paragraph as suggesting that the port will have ported latent bugs (either not yet discovered, or mere foot-guns), and when it happens, there won't the same institutional knowledge to diagnose it.
I could believe that could happen, but I don't think it's a train wreck. Lots of code ends up being debugged by people other than the original authors.
It's funny to me that bun was the cause of the recent claude leak according to some people: https://www.reddit.com/r/programming/comments/1s8t8hp/a_bug_in_bun_may_have_been_the_root_cause_of_the/ .
An now we are all taking this claim of the "rust rewrite" at face value, as if the engineering practices in that project were so excellent that they are continuing in a long-standing tradition of good engineering practice, and there was thought and care in this "merge to master". I can also merge code to master without reviewing it, but that doesn't mean that the code will continue to work after I've done that.
The proof like always will be in the pudding. :)