Bun's Rust rewrite has been merged

85 points by tuananh

lorddimwit

I’m sad for multiple reasons:

This was a banner project for Zig, which I liked
The rewrite was apparently (almost) entirely done by AI
The Rust rewrite apparently (though I haven’t confirmed) leans on a bunch of unsafe and unidiomatic stuff.

What a shame.

jmillikin

The Rust rewrite apparently (though I haven’t confirmed) leans on a bunch of unsafe and unidiomatic stuff.

I'm in the middle of converting a few large C and C++ codebases to Rust, and this is always the initial state of the code. If the existing codebase already followed Rust principles then you could in theory skip the "unsafe everywhere" stage, but where do you find a non-Rust codebase that has single-mutable-borrow semantics, or properly annotates the nullability of every single pointer?
- tentacloids
  
  I've not done such a large conversion before, so genuine question: in my mind you'd make it so the bulk of unsafe is FFI'ing out to the as-yet-unconverted code, to allow you to do it piecemeal and without changing every variable at once, but are you saying there's a necessary intermediate step of transliterating to C-flavoured-unsafe?
  - jmillikin
    
    I've got two approaches I've found work best depending on code complexity: line-by-line and function-by-function. Both of them involve a lot of unsafe.
    
    Using some code from public-domain pikchr file lemon.c as an example:
    
    int acttab_action_size(acttab *p){ int n = p->nAction; while( n>0 && p->aAction[n-1].lookahead<0 ){ n--; } return n; }
    
    I would prep a Rust function like this:
    
    // int acttab_action_size(acttab *p){ pub unsafe fn acttab_action_size(p: Option<NonNull<acttab>>) -> c_int { // int n = p->nAction; // while( n>0 && p->aAction[n-1].lookahead<0 ){ n--; } // return n; todo!() }
    
    A line-by-line conversion turns that into (approximate, this is hand-typed in the Lobsters comment field, not compiled):
    
    // int acttab_action_size(acttab *p){ pub unsafe fn acttab_action_size(p: NonNull<acttab>) -> c_int { let p = p.as_ptr(); // int n = p->nAction; let mut n: c_int = unsafe { (*p).nAction }; // while( n>0 && p->aAction[n-1].lookahead<0 ){ n--; } while n > 0 && unsafe { (*((*p).aAction.add((n.wrapping_sub(1)) as usize))).lookahead } < 0 { n = n.wrapping_sub(1); } // return n; n }
    
    The goal is to have very small units of translation from the C/C++ (or if things are really bad, from the assembly) and then you just verify that each line does exactly what the comment above it says.
    
    This is close to what c2rust produces, except c2rust operates on the compiler AST so it is both more accurate (every pointer is a *mut, every loop is transcribed from a CFG) and overly accurate (macros decompose instead of being pseudo-functions, types like FILE* get transcribed exactly.
    
    LLMs don't do this style naturally but they can be coaxed into doing it if you use the right prompt and maybe an example of the expected style. Failure modes are generally mild and easy to detect (like unindenting the entire function). The result is just a straight-up better c2rust.
    
    Then once you've got the whole program like this, you can start refactoring it from bad Rust to kinda-ok Rust, then add tests, then you keep refactoring until it is good Rust.
    
    A whole-function conversion would start by adding a shim, so you can ensure your function is being called and you haven't missed any func pointers or macros or other sneaky business:
    
    /* in C */ int acttab_action_size(acttab *p); int acttab_action_size_OLD(acttab *p){ int n = p->nAction; while( n>0 && p->aAction[n-1].lookahead<0 ){ n--; } return n; } // in Rust unsafe extern "C" { fn acttab_action_size_OLD(p: NonNull<acttab>) -> c_int; } #[unsafe(no_mangle)] pub unsafe acttab_action_size(p: Option<NonNull<acttab>>) -> c_int { unsafe { acttab_action_size_OLD(p.unwrap()) } // int n = p->nAction; // while( n>0 && p->aAction[n-1].lookahead<0 ){ n--; } // return n; }
    
    Then you just translate the whole thing in one go (or have an LLM do it) and delete the C version:
    
    impl acttab { pub fn action_size(&self) -> c_int { // you get the point, it's just normal Rust } }
    
    Saves time if the functions are simple. LLMs are great at this style, if there's nothing wild going on in the original then you can point even a low-powered local model at a file full of todo!() stubs and it'll plow through the whole thing automatically.
    
    Either way you will have a period of time where the code is C/C++/whatever written in Rust syntax, with raw pointers and multiple mut to the same value and all sorts of things that make rustc very unhappy. The trick is compiling in debug mode will mostly let you get away with anything in terms of pointers, and you'll also get to find places where the original code relied on arithmetic overflow.
    
    tentacloids
    
    Oh okay, I think the whole-function version looks something like how I'd imagine the process, but couldn't you still do away with the unsafe shim wrapper if you essentially start from main() and work your way down? That's the bit I don't quite get; there seems to be still the final step of making it look like real Rust after all the work of making Rust that looks like C from the C that looks like C that you could use from Rust.
    
    (I suppose there's the functions right at the bottom of the call tree that could really do with conversion as a priority, but I'd personally rewrite those from scratch and make sure they work as expected with a test that literally calls both versions and compares them.)
    
    aapoalas
    
    Here's an experience based talk on converting a C codebase to Rust: https://youtu.be/H0AUP2OgppE?is=VRC977IshsC4I6T4
    
    They recommended starting at the leaves and working upward, producing more or less idiomatic Rust at the leaves with unsafe FFI wrappers for C to call into, and gradually changing from the unsafe FFI layers to safe APIs as you move upward the call stack.
    
    jackdk
    
    I saw a really interesting talk from Lambda Days about doing this in Haskell to rewrite a Rails application, too: https://www.youtube.com/watch?v=ip0UFQbiOkQ . It's quite a broadly applicable pattern, rather like a reverse "strangler fig" migration.
    
    jmillikin
    
    It depends on the shape of the original program, and where its own unsafe semantics are. Remember that the unsafe in Rust is merely documenting pre-existing behavior in the original.
    
    If you do a bulk migration from unsafe C to unsafe Rust then the codebase is easier to work with and more amenable to analysis than a mixture of unsafe C and safe Rust.
    
    tentacloids
    
    I see! I'm starting to get it. Perhaps still one of those things I'd have to feel the pain of attempting otherwise to appreciate, but that's a me thing. Thanks! (Also thanks for taking the time to write that longer reply earlier.)
    
    opliko
    
    Also, it's not like they have their own JS engine: they're still using JavaScriptCore, and unsafe in bindings to C is basically inevitable. I've seen an unsafe count comparison with Deno, which I think missed that Deno has their V8 bindings in a separate crate and repository, while bun is more of a monorepo.
    
    There is probably a lot of unnecessary, and possibly unsound, unsafe in the new codebase, but I imagine most regressions will not be related to that.
    
    aapoalas
    
    Having done a bit of work in the rusty_v8 repository (and some paid work for Deno in general, so you know I'm biased), it's actually remarkable how little unsafe there is despite the bindings. Of course all the FFI requires unsafe but after that layer the Deno folks have gone to great lengths to get the unsafe out of the public APIs that they rely on.
    
    mort
    
    I'm curious to see how the project will evolve now that there's a million completely AI-generated lines of code and nobody understands any part of the project anymore. This is the kind of disastrous loss of expertise you typically only see when all core developers leave a project, done completely voluntarily.
    
    Maybe it'll be fine, but it seems risky.
    
    melodyogonna
    
    I'm looking at the source code. It is a 1:1 port of the Zig code, and both actually exist side by side. If the maintainers know what the Zig code does, they already know what the Rust equivalent is doing... which is whatever the Zig counterpart did.
    
    mort
    
    Well that's the hope. Nobody knows how true it is. But you're probably right that it's true enough to be a useful starting point.
    
    viraptor
    
    On one hand side, there's thousands of unsafe regions. On the other, zig doesn't provide nearly as many memory safety features, so you can also see it as going from almost every function being unsafe to just a few thousands. So that part isn't that bad.
    
    stilic
    
    But will the maintainers take proper responsibility for understanding and working on the code that was mostly written by a LLM? I don't think so.
    
    mayli
    
    But will the maintainers take proper responsibility
    
    But will the maintainers take proper responsibility to trust gcc/clang and etc? We will be there eventually, and but there are lesser folks insist writing ASM avoid C in 2026 than in 1996.
    
    Unreal
    
    You trust that if gcc/clang/<compiler of choice> generates incorrect assembly from your code, that you can file a bug report and it'll be fixed, or that you can track it down and fix it yourself. Anthropic/OpenAI/Google don't want those bug reports, they don't care, because they don't care if their tools produce incorrect code at all. I can't go looking through an LLM's code to figure out why it produced incorrect code, because that's not how they work, they're a black box designed to be used in a non-deterministic fashion with no way for the average dev debugging them.
    
    Someone could correct me on this, but I'd wager most of the use cases for ASM in 1996 were performance related, not because the developer didn't trust the compiler to be correct. And, there's no guarantee that LLMs will get any better, they could continue to produce incorrect at the same rate or worse forever, but considering how LLMs work, I wouldn't count on them getting that rate down to 0, or to fail in the same ways humans do.
    
    rberger
    
    There's pretty good reason to trust Clang and GCC. The lack of knowledge of why they are trustworthy doesn't make a good objection.
    
    There are tools and tests that are engineering feats on the levels of the compiler themselves. The people who work on GCC and Clang have a sense of responsibility for their work. And, there are companies in regulated industries such as Toyota who want Clang to be certified as safety critical and therefore have a regulatory duty to work on stamping out miscompilation.
    
    I spent most of my senior year in college on a project to find bugs in the ARM backend of LLVM. We ran (and still do) huge fuzzing campaigns generating random programs which have their compilation checked by a Alive2. We found some bugs, sure, many in programs that were nearly impossible to generate anyways (due to the use of undef for example or casting 54 bit integers to floats).
    
    However, the fact that we were generating billions of mutant programs and we found less than 30 bugs is astonishing. Gcc had a similar project currently ongoing and finding similar bugs. These tools are seriously hardened. Not bug free, but some of the closest humanity has got.
    
    The output of an LLM can't even get through a sentence, let alone a billion, before it hallucinates or bullshits me.
    
    These bugs got fixed within a day and with attention to detail, and correct citations of the spec by engineers who knew what they were doing. LLM vendors haven't been able to solve hallucinations for how many years now?
    
    One week from a guy whose only developer experience is web frontends doesn't scream "formal methods, fuzzing, and careful Rust spec reading" that would lead me to trust the output of his LLMs
    
    academician
    
    I'm reminded, though, of a previous post that argued that Zig is safer than unsafe Rust.
    
    viraptor
    
    Yeah, the spectrum is likely C - some zig - unsafe rust - rest of zig - default rust. That doesn't really change the main argument though.
    
    tomjakubowski
    
    Comment removed by author
    
    mayli
    
    I'm looking at the source code. It is a 1:1 port of the Zig code, and both actually exist side by side. If the maintainers know what the Zig code does, they already know what the Rust equivalent is doing... which is whatever the Zig counterpart did.
    
    I feel the from unknown-unsafes to known-unsafes, is not really a bad thing.
    
    tomjakubowski
    
    it's not just that there's loads of unsafe code, the unsafe code is also incorrectly and unsoundly wrapped in safe code, causing UB:
    
    https://github.com/oven-sh/bun/issues/30719#issuecomment-4453771886
    
    adding a lifetime parameter to that struct and tying it to the &[u8] in the constructor is something you'd learn on the first day of Unsafe Rust 101
    
    mitsuhiko
    
    I found clankers to be incredibly good at eliminating unsafe for what it’s worth with autoresearch.
    
    mayli
    
    Yeah, eliminating unsafe isn't a hard task for LLM i believe, but that makes it harder for human to understand if we make transpile+de-unsafe in one go.
    
    kablamooo
    
    Yeah, what I’ve seen is enough for me to stop using bun
    
    ale
    
    Given how much Zig has been bashed recently by the Bun team for being memory unsafe it’s probably good riddance.
    
    alterae
    
    yeah i find this news rather saddening for about the same reasons
    
    just the second bullet point alone is enough to make me never want to touch bun again, which is a shame bc in the past i've had some very good experiences with it
    
    [edit: was missing a newline my bad]
    
    bwbuhse
    
    Comment removed by author
    
    ale
    
    Meaning that TigerBeetle is not particular well written?
    
    bwbuhse
    
    I deleted my first comment cause yeah that’s kind of what I meant but I haven’t really looked at the source for either, I just remember Andrew Kelley calling one of them not great but can’t remember which :)
    
    jamii
    
    Apart from the hand-written-async-state-machines, which are an unfortunate necessity, tigerbeetle is the highest quality codebase I ever worked. Still plenty of things I would change (eg the intrusive data-structures are a source of bugs and reading difficulty, and I don't think they are necessary for performance) but it's the only company I have worked at that really prioritized code quality and developer experience.
    
    bwbuhse
    
    Makes sense! Yeah, that's why I deleted my first comment (not that it was actually saying TigerBeetle was bad either). I just remembered hearing some big Zig project wasn't idiomatic and couldn't remember if it was Bun or not, so I'd said I might have been confusing it.
    
    kristoff
    
    To provide a different perspective on the correctness debate, Zig has two main projects that people know about:
    
    Ghostty
    
    TigerBeetle
    
    TigerBeetle, "despite" being written in Zig, seems to be doing exceptionally well:
    
    (major thing) Passed Jepsen with flying colors https://jepsen.io/analyses/tigerbeetle-0.16.11
    
    (minor) AI vulnerability scans so far haven't found anything worth reporting https://x.com/SwivalAgent/status/2054063291266113994
    
    But ok, people will say that TB doesn't count because they do funky nasa stand-on-one-leg-and-do-a-backflip development that normal development teams don't do (although it's not like they couldn't, like for stuff that matters, but anyway). Fine.
    
    Let's look at Ghostty then. Also written in Zig, no "ghost-style" that I know of, and Mitchell has been vocal about his usage of AI to develop the project (another hint that the project is run normally and not with the discipline of a Mongolian monastery). As far as I can tell as a user, Ghostty is fast and works well. Never crashed on me once. Now, Mitchell is known to be a good engineer, but I flat out reject any argument that claims that this only works "because it's him". Ghostty is an open source project, code comes from a variety of contributors, and most importantly, Mitchell is just a guy that puts effort in what he does, the same way that you can put effort in what you do.
    
    For contrast, Bun got to the place it was in because of deliberate choices that often had very little to do with the language.
    
    To bring just one example up, Bun had (has?) wrong asserts that would crash the executable in ReleaseSafe and potentially cause UB[1] in ReleaseFast. Since Bun shipped ReleaseFast executables (already a debatable choice for a node replacement), the result would be non-actionable bug reports that showed weird behavior and non-deterministic crashes.
    
    The 'solution' to this problem was:
    
    Ban usage of std.debug.assert
    
    Replace asserts with a new implementation that ENTIRELY DISABLES THE ASSERT in ReleaseFast (this is where allow_asserts is set false for ReleaseFast)
    
    This has the positive effect of removing potential UB caused by wrong asserts, unfortunately this does not help at all with the fact that your code is written with wrong assumptions in mind, with invariants and pre/post conditions that don't hold, meaning that you will still cause issues to your users, and that bug reports will still suck.
    
    I find it a fairly objective statement that releasing ReleaseFast executables and then toggling off asserts instead of fixing them, shows that Bun has deliberately chosen a cavalier approach to software correctness.
    
    In my opinion software correctness is the outcome of a process, not of any technological choice (those might support or hinder your process, but you gotta have one), although I do hope that the switch to Rust will give Jarred and Anthropic an opportunity to rethink the way they approach correctness.
    
    [1] For clarity for those who are not familiar with how this works, in ReleaseFast, if an assert guarantees that a number is positive, for example, then the compiler will be able to remove any subsequent if (num < 0) {...} for example, removing a useless runtime operation. Unfortunately, if the assert is wrong, then the executable will be missing a piece of logic, resulting in wrong behavior.
    
    clerno
    
    This seems to frame the problem as "Jarred doesn't know Zig well enough, that's why there was a problem". But then the story changes to that of Rust being a language that by construction is better at yielding correct programs than Zig even if the programmer isn't that skilled?
    
    kristoff
    
    You misunderstood my post, turning asserts off in release builds has nothing to do with Zig, and all to do with the decision-making of the company and their development process.
    
    It's a telling example that shows that correctness was never a priority for Bun.
    
    Case in point, the same exact logic is also present in the Rust code:
    
    https://github.com/oven-sh/bun/blob/175f62ab1574fe47df5ab5e6ffb3be878b607e4c/src/bun.rs#L1515-L1519
    
    https://github.com/oven-sh/bun/blob/175f62ab1574fe47df5ab5e6ffb3be878b607e4c/src/bun_core/env.rs#L48
    
    Presumably the Rust code has the same exact falsifiable asserts that get turned off via that flag.
    
    rtpg
    
    I think "disabling asserts in prod" is a pretty common technique, yeah?
    
    I do feel like if you've lost confidence in the asserts being "right" and that leading to code ellision that you would not expect.... then at the very least disabling that optimization sounds like the right move. I don't know how possible that is though.
    
    rpjohnst
    
    Yeah, Zig turning asserts into assumptions in ReleaseFast is one of the most absurd things I have ever seen. Disabling that behavior to get back to "disable asserts in prod" is extremely reasonable.
    
    mitsuhiko
    
    It's a legitimate programming pattern. They basically use assert like Rust's assert_unchecked or LLVM's __builtin_assume. You better be sure about your assertions though.
    
    jmillikin
    
    The problem is that the same statement is assert!() in debug mode and unsafe { assert_unchecked!() } in release mode, which is probably not ever what you want.
    
    Either asserts are for debugging and simply get removed in release mode (similar to checks for arithmetic overflow), or they're part of the program and are allowed to affect control flow (like C's guarantee against NULL deref).
    
    Turning off assertions in release mode is not what I would do given Bun's relatively low performance requirements, but turning them off is much better than Zig's behavior of silently transforming them into compiler-visible assumptions.
    
    mitsuhiko
    
    I would absolutely want my assert_unchecked to panic in debug. In fact, I would absolutely never want that thing not to check in debug in case the assertion is wrong!
    
    jmillikin
    
    Agreed, the question is mostly about what happens in release mode. Rust has assert!() and assert_unchecked!() and debug_assert!() so you can be fine-grained about your goal, C has ASSERT() so you can verify things look good in tests and then leak your TLS keys in prod (per tradition), Zig has ... discovered an even more dangerous option than C.
    
    rtpg
    
    I guess I would want two flavors of this:
    
    "please crash in debug mode if this isn't true"
    
    "please try to optimize knowing this invariant"
    
    Can think of a lot of cases of the first one, the second one.... seems a bit tougher but I can see it being used for certain container types
    
    kristoff
    
    Disabling asserts in prod is ~~perfectly~~ fine in general (maybe, I would actually put that into question as well), but Bun had/has plenty of known, wrong assertions. By completely turning them off (the if(!allow_assert) return branch in fn assert which gets evaluated at comptime) you are both:
    
    never going to fix the assertion because the program will continue running even when the assertion would trip
    
    going to continue writing code that relies on a given pre/post condition or invariant, never realizing that a given assert is actually misleading, resulting in logic bugs galore in your code
    
    You gotta fix your asserts, there's no way around it. Then once your asserts are fine you can choose if you want to keep them crashy (ReleaseSafe), leverage them for optimizations (ReleaseFast, more dangerous), or do something else.
    
    For a node replacement that is meant to be exposed to the internet it might also be a good idea to make ReleaseSafe builds, not ReleaseFast. ReleaseFast makes more sense for stuff like games.
    
    hoistbypetard
    
    Things seem to have changed since this comment 9 days ago.
    
    I work on Bun and this is my branch This whole thread is an overreaction. 302 comments about code that does not work. We haven’t committed to rewriting. There’s a very high chance all this code gets thrown out completely.
    
    I’m curious to see what a working version of this looks, what it feels like, how it performs and if/how hard it’d be to get it to pass Bun’s test suite and be maintainable. I’d like to be able to compare a viable Rust version and a Zig version side by side.
    
    adriano
    
    I came here looking for this.
    
    How one goes from so many big, open questions about a PR to merging it a few days later, is mysterious to me. I think most of us who doubted the lack of "commitment" to the work and the "very high chance" of the work not being merged will be forgiven for having had doubts about the author's intentions.
    
    jsnell
    
    The open questions are not resolved by time. They're resolved by somebody finding the answer. Once you have the code, answering the questions about performance, passing the test suite, and the "how it looks/feels" is pretty easy.
    
    You're just miscalibrated on the time needed for that "have the code" prerequisite these days.
    
    aapoalas
    
    There was a message from the author I believe 3-4 days after that where they had gotten more than 90% of tests working IIRC and were very positively surprised.
    
    Aks
    
    This is really sad to see. Not because of the language, I do not care, but how careless this is.
    
    There used to be some pride in crafting, even in our profession. Now it's just.. This. More code, more fast, who cares about maintainability?
    
    If I was using Bun, I would be scrambling to figure out where to move.
    
    simonw
    
    What makes you think Jarred isn't proud of this and doesn't care about maintainability?
    
    Aks
    
    I am sure he is proud of this.
    
    +1,009,257 -4,024 says enough about maintainability, however.
    
    simonw
    
    If the Rust code is a direct port of the Zig code that he wrote himself, based on porting rules that he provided, I don't think it's safe to assume that it's not maintainable for him going forward.
    
    Aks
    
    Good luck to them.
    
    ale
    
    Large parts of the Bun codebase in Zig were also AI assisted ports from other codebases, like the lexer/parser taken from esbuild for example.
    
    zetashift
    
    The amount of eyes this story is getting probably the real win here for Bun. A runtime that already focused on memory safety or stability won't get the clout of Claude rewriting $thing in Rust, or a runtime that didn't need a 1m diff is of course not going to be blazing fast /s.
    
    I'm not even sure what technical value things hold now thanks to vibecoding and slop.
    
    I do know that I'm forming a strong detachment of these types of projects, there is 0 sense of coherence, technical value over existing projects and community.
    
    doyougnu
    
    Completely agree. This feels to me like another well crafted media stunt to garner exactly this kind of attention. Its just like the "we vibe coded a C compiler" project that totally works and can compile linux. Just nevermind that they had to reuse and have access to a very well crafted test suite and needed direct access to GCC.
    
    I really fail to see what this buys them other than publicity. Like, how has bun actually changed as a result of this rewrite? If anything all I've learned here is that the common wisdom that network effects are less important now. Want to gain safety by rewriting something from c++ to haskell. Go for it. Want to generate a bunch of missing libraries for you favorite tech stack? Have at it. The old reason of "it has a library for everything under the sun" is a lot less meaningful when anyone can do these kinds of rewrites.
    
    zetashift
    
    This feels to me like another well crafted media stunt to garner exactly this kind of attention.
    
    Just for clarity I don't think the PR started as a marketing stunt, but it sure went viral.
    
    The old reason of "it has a library for everything under the sun" is a lot less meaningful when anyone can do these kinds of rewrites.
    
    Yeah, this! I guess somebody could spin this into "look what previously could not be done in such a timeframe! This is agentic engineering" but for me it reads as vibecoding. I don't care for such software and would rather not spend my time on it, OSS isn't about immediate value for me.
    
    kablamooo
    
    I don’t think you see the sentiment. Most people have a very negative view of this everywhere i see it discussed
    
    zetashift
    
    I think the people that see this as a negative, are most likely to be vocal about it online (like me ha), but even if negative, it's still attention. People are interacting with a project they might previously not have and I'll take a gamble and say that a silent part wants to see how this up so it can see how far LLMs can be pushed.
    
    Aks
    
    True, this very much seems like marketing thing.
    
    cryptocode
    
    This is good stuff. The average quality of large Zig projects just went up considerably.
    
    parisosuch
    
    Another shame is that Github is failing to load the comments on this PR. What kind of product can't load 1000 comments on a page?
    
    Aks
    
    At this point I do not really expect anything from Microslop to work.
    
    trenchant
    
    A modern one. We turn out worse programming tools every day. React was a mistake.
    
    Jackevansevo
    
    Not a user of bun in production but if I was I'd be pretty nervous. Passing the test suite is one thing, but who knows how many things this is going to silently break? Guess it depends how comprehensive their tests are and what they're willing to commit to.
    
    jamii
    
    I don't think you become a user of bun in production if this kind of thing makes you nervous. It's always been a fairly YOLO project.
    
    nchammas
    
    Probably better to link to the actual pull request, no?
    
    Also, look at that diff count: +1,009,257 -4,024
    
    tuananh
    
    tried that but couldn't due to policy of lobster
    
    ClashTheBunny
    
    For the "remove zig" pr that just got closed:
    
    +22-639,678
    
    So this is more of a part 1 of 2 situation. Still, impressive diffs.
    
    nickmonad
    
    Has got to be some kind of record for the ratio between PR description length and lines modified.
    
    oz
    
    In the US, does that mean that Bun’s source code is now public domain?
    
    tomjakubowski
    
    I was wondering about this too. Curiously, while the repo claims that Bun's code is MIT licensed, I couldn't find any explicit copyright assertion anywhere, even looking at it before the rewrite. (There are some copyright claims, but they are on third party code which has been vendored into the codebase.)
    
    Nor is the MIT license, which starts with a copyright notice, reproduced anywhere in the repository. How can you assign a license to code if you don't have copyright on it?
    
    https://bun.com/docs/project/license
    
    pscanf
    
    It's probably still too early to call it a success, but wow, really impressive stuff. Especially the speed of execution.
    
    And also absolutely "a mess" of a workflow. Almost 7 thousand commits in less than 6 days. The GitHub UI can't even list them. Looking forward to the blog post to learn which review strategy they used and what gave them enough confidence to merge.
    
    FedericoSchonborn
    
    review strategy they used
    
    "Claude, review this pull request"
    
    briankung
    
    "Find all the bugs"
    
    Halkcyon
    
    Looking forward to the blog post to learn which review strategy they used
    
    I'm leaning towards "none" and they only checked the test suite then rammed it in.
    
    natkr
    
    Looking forward to the blog post to learn which review strategy they used and what gave them enough confidence to merge.
    
    Given that this is about Bun… what review?
    
    nickmonad
    
    From the PR,
    
    and most importantly, we now have compiler-assisted tools for catching & preventing memory bugs, which have costed the team an enormous amount of development & debugging time over the years.
    
    At the risk of getting too speculative, the lead up to this point is something I don't fully understand.
    
    According to Wikipedia, the initial release of Bun was in September of 2021, with the first stable 1.0 release being 2 years later in September of 2023. Then, we have a Rust rewrite roughly 2.5 years after that.
    
    Is there any indication of the Bun team publicly mentioning the struggle to work through memory-related bugs prior to a few weeks ago when the rewrite was known to be a possibility? To the point where Zig was perhaps thought about as the wrong choice? I'd like to understand why they chose Zig in the first place, and how that was weighed against an increasing difficulty finding and resolving memory-related bugs.
    
    From one angle, it kind of looks like there was a larger pressure to rewrite into a language that many have agreed is just "better" for LLM generation (presence in training data, borrow checking enforced by compiler), because that's the way they want to (and are likely being asked to) develop.
    
    I don't doubt they struggled with memory-related bugs in the Zig implementation, just like I don't doubt the value of the borrow checker. I'm just hoping for more clarity around the decision-making timeline and if Rust was considered as an option at any point in the last 5 years. Was it something that was wanted for a long time and finally seen as a reasonable thing to do with the advent of LLM improvements?
    
    (Of course, no explanation is required on their part. They can do whatever they want with their software! Just curious about how this is being presented and playing out.)
    
    nickgirardo
    
    The entire point of memory safety and other guarantees from Rust feels like a post hoc justification for me. (Edit: especially considering the tremendous volume of unsafe in the merged PR)
    
    Anthropic was likely embarrassed by depending on a project (Zig) that had a strong policy of rejecting LLM generated changes. The fact that Zig held strong on this policy despite Anthropic providing a PR to significantly improve compilation times likely felt like a slap in the face to them
    
    nickmonad
    
    Just to be clear on the compilation work: "AI is entirely besides the point here."
    
    hoistbypetard
    
    From one angle, it kind of looks like there was a larger pressure to rewrite into a language that many have agreed is just "better" for LLM generation (presence in training data, borrow checking enforced by compiler), because that's the way they want to (and are likely being asked to) develop.
    
    I'm only a spectator here, and I have no insider knowledge; I don't use Zig, only barely use bun, and only (so far) use Rust for experiments, not for code I rely on in production or need to maintain.
    
    With that disclaimer, this looks to me a lot like the kind of experiment I might expect to happen when a team that likes LLM programming tools suddenly finds themselves with an unlimited budget for such tools.
    
    bakkot
    
    Is there any indication of the Bun team publicly mentioning the struggle to work through memory-related bugs prior to a few weeks ago when the rewrite was known to be a possibility?
    
    Bun was notorious for memory-safety bugs. The standard comparison here is that of Bun's 16k issues on Github, there are > 2500 which mention a segfault, compared to just over 400 of Node's 20k issues.
    
    I don't know if they talked about the struggle publicly but it was obviously a problem, and has been for a long while.
    
    Node, of course, is also primarily written in a memory-unsafe language (namely C++). I can't say why the difference is so large, although Node does make use of C++'s affordances for memory safety (RAII, shared_ptr, etc).
    
    mort
    
    Might I also dare suggest that Node has a more robust development process? I haven't looked that much into either Node's or Bun's, but it's hard to imagine a more haphazard process than one where some guy can get Claude to spit out a +1,000,000/-600,000 diff which rewrites everything into another language and have it merged in a matter of days. Their recent stunt of "We used Claude to parallelize the Zig type checker and it got 4x faster", where the result turned out to make type checking nondeterministic, also doesn't inspire a ton of confidence.
    
    I develop a lot of code in memory unsafe languages, and my experience is that you can absolutely manage it, but you need to be meticulous. You need to consider contracts between different parts of the system. You need to carefully consider things like, "does this function return a value which I own and am responsible for freeing, or does it return a value which I borrow and can only use for some specified amount of time until it becomes invalid?". Good C APIs carefully document the ownership semantics of every pointer passed to or returned from every function. And of course, in C++, stuffing things into RAII wrappers or unique/shared pointers helps a ton.
    
    Haphazard code, where people just slap code together until it "works", tends to be rife with edge case segfaults caused by ownership confusion.
    
    aapoalas
    
    Sort of entirely unrelated, but by golly it seems to be hard to find thorough documentation of ownership semantics in C API libraries! Or rather, the one main time I remember is when I was making Deno bindings for libclang. For some reason libclang absolutely will not clearly document any of their ownership semantics basically anywhere! Maybe it's because it's just a C API exposed from a C++ library, and hence not very important? But it was annoying nonetheless! :)
    
    mort
    
    I agree, it's terrible how uncommon it is! The best example I've seen is from GStreamer, where most pointer return values and many pointer parameters are tagged with either [transfer: full] or [transfer: none], and nullable pointers are tagged [nullable]. It's not perfectly consistent but it's miles better than most other libraries.
    
    Google code is some of the worst. I've accidentally introduced memory leaks into WebRTC code because ownership was transferred by raw pointer in C++. I naïvely assumed that, because it's C++, ownership would be transferred via unique_ptr or shared_ptr, or in the very rare cases that it'd be transferred by raw pointer for historical reasons, that would be documented at least in some comment in a header file. But no, no mention in any documentation or comments, just a raw pointer passed from WebRTC code to user code as a function parameter which will leak memory unless your user code frees it.
    
    aapoalas
    
    Oof!
    
    arcayr
    
    i know bun wasn't ever really at the "production-ready" stage even before this (and comments elsewhere in this thread and on the value-to-shareholders-enthusiast site demonstrate various rather amusing examples of why), but i thought this was a joke when the pr was created and i'm still not convinced it's serious now.
    
    the idea that someone, in three weeks, produced, reviewed, and tested a one million line diff is preposterous, and miri (rust's official undefined behaviour checker tool 3000) seems to be screaming at the most trivially identifiable things. it's a pretty simple tool to hold, and claude as claimed should have had no issues using it, so was it absentee development practices, forgotten, ignored, or simply unknown to the llm's jockey? is there an actual rust developer (literally any, just one is enough for this question), who actually knows rust, working on the bun team?
    
    someone did the math so that i don't have to:
    
    architector4@AGOGUS:/tmp$ git clone --depth=1 'https://github.com/oven-sh/bun' Cloning into 'bun'... … architector4@AGOGUS:/tmp$ cd bun/ architector4@AGOGUS:/tmp/bun$ find -type f -name '*.rs' -exec grep unsafe {} \; | grep -v '//' | wc -l 13255
    
    ....Thirteen thousand two hundred and fifty five lines without comments with the word "unsafe" in them in Rust code files across this rewrite.
    Sure, there's some appearances of C API interop going on, and perhaps there's some really ultra performance sensitive things here and there that warrant use of unsafe. But something tells me that for a proper safe Rust rewrite without such egregious soundness bugs littered like candy, this codebase would need to get ship of Theseus'd a second time over.
    
    i guess this might have been some sort of super weird reverse marketing stunt by anthropic, given the sheer level of "huh?" to the original pr, but they've managed to turn the opinion dial back down from "it's a bit of a yolo project but it's alright for fun"—a position at which it had only really recently arrived—to "tinker project", so i'm not quite sure that worked as planned. nobody with money on the line is going to use this when completely-solved problems like constructing an owned string from a str reference causes ub because they don't know how lifetimes work.
    
    we're all adults here so playing pretend about capabilities when you were effectively just a proxy harness for machine-generated content isn't particularly valuable long-term unless you're specifically vying for a job with the title "vice president" or "senior manager" in it. would you hire this to work on your product or service? would you accept a million-line (or, for unfairness, let's reduce it by an order of magnitude to 100,000 lines) diff that has been demonstrably barely tested? i'd watch a documentary that explains the sort of organisational culture that would make one proud of this sort of thing, because i truly genuinely don't understand ("am i out of touch?"), because i personally would be ill with stress if i submitted something even 1/100 of this, with tests. i updated a package for nixpkgs the other day and checked four separate times in different ways that my changes—a total of four lines—were correct. i don't know if that's just over-perfectionism.
    
    while reading through the carnage i did catch this message from claude in another pr from a couple hours ago, that made me laugh (emphasis mine):
    
    I didn't find correctness issues, but this touches 45 files including core syscall dispatch, spawn, and the crash handler, with a few non-mechanical bits (raw-syscall statx/memfd_create shims, hardcoded POSIX_SPAWN_SETSID) — and the Android build legs are still red in CI — so worth a human pass.
    
    the commit was merged four minutes later.
    
    i look forward to watching bun's progress over the next six months. it will be either an incredible redemption arc for the team, a revert to the zig version with a very quiet "we'll try it again in a year" sorta beat, or it'll be a solid double down and bun's development process will largely pivot to "fix the all the things we had once figured out, but then broken, and now in a new language that nobody on the team knows without using that language's most prominent paradigms" instead of fixing genuine discrepancies in the runtime for about 12 months.
    
    but hey, the tokens were free right? that cost nobody, anywhere, anything, so no harm no foul... right?
    
    jado
    
    Here is the Zig to Rust porting prompt they used. You can see it's only a partial mapping of idioms and that many things are still Zig/C-like, like the many unsafe usages mentioned.
    
    xjix
    
    Interesting messaging with this whole saga. It's just an experiment, but also its been shipped in short order? Like its their prerogative, but why is the posting about it so cagey?
    
    It's an interesting experiment, but it seems like you'd want to take a more careful approach and have both versions exist in parallel for a while. So early adopters can use the rust version and more conservative adopters can stick with what they already have that already works.
    
    jaredkrinke
    
    Without any commentary or context from the authors, I don’t see the point in sharing this story. All we will get is speculation.
    
    eatonphil
    
    Even with commentary I'm not sure of the point. Only time will tell if this is an improvement or not. I'll check in in a year or two.
    
    simonw
    
    Jarred has been promising a blog post about this for a week or two.
    
    tomjakubowski
    
    Excited to run it through Claude and get a bullet point summary I can read over coffee.
    
    mort
    
    I would've loved to read the initial PR review, but GitHub seems too incompetent for that...
    
    travisgriggs
    
    I am not heavily trafficked in any of JavaScript, Rust, or Zig, though have dabbled and read snippets of each.
    
    JavaScript is not a "safe" language. Depending on how you want to wield the definition of "safe", [Visual]Basic may be the only thing I can think of that has that same realm of "we'll see if it runs the way we expect or not".
    
    Am I the only one that feels a massive sense of irony here? That Anthropic will have a "safe" VM, that executes code that is anything but safe, to run its tools that interact with their LLM. And that their LLM , what is essentially a giant probability net, purposefully seeded with "noise", will write the less-than-safe code to interact with itself on the otherwise "safe" VM.
    
    zladuric
    
    JavaScript isn't being discussed at all here and is largely irrelevant. All of the JavaScript bits are done in javascriptcore which bun wraps (like node wraps V8).
    
    0x2ba22e11
    
    Depends what meaning of "safe" you mean. The ECMAScript spec doesn't include undefined behaviour, not even for data races on SharedArrayBuffers. Every operation has defined behaviour. Some of them (like {} + {}) are defined as having behaviour which is not very good, but they still are defined.
    
    ECMAScript mostly tries to make the language semantics deterministic, such as the iteration order for Map and Set objects being deterministic and WeakMap being defined so that you can't iterate the contents of a WeakMap so that you can't observe whether or not a particular object has been garbage collected yet.
    
    In my experience it's fairly easy to stick to a subset of ECMAScript in which you don't routinely run into heinous problems caused by the language itself. Use TypeScript (only for checking, not for compilation), turn on its strict mode, lean on the type system at least a little bit and primarily use async/await for concurrency (wrap any callback based API in a Promise as soon as you can), use Map<string, Foo> rather than Record<string, Foo>, refrain from abusing getFoo() as any or getFoo()! syntax. In my experience code written this way tends to throw null pointer exceptions much less often than Java or C# code in the wild because TypeScript doesn't have pervasive nullability and it tends to have an easier time with domain modelling because TypeScript does have the ability to encode sum types.
    
    moltonel
    
    The way this is going, the Bun code is also turning into a giant probability net. Instead of training the weights of a neural network against expected outputs, we're training lines of code against human feedback. In both cases, the artefact is an unreviewed black box. Someday Bun will start hallucinating the existence of JS promises, and we'll be told that it's an unfortunate but reasonable aspect of how modern software works.
    
    I admit having exaggerated the previous paragraph, for dramatic effect. But the parallels between that vibe-coded Bun and an actual LLM are scary.
    
    Cloudef
    
    This is the true software engineering
    
    whereistejas
    
    hallelujah!