Why doesn’t Rust care more about compiler performance?

59 points by ajdecon

jmillikin

The article touches on this very briefly, but IMO a lot of the blame for Rust’s reputation for slow compilation rests on the shoulders of Cargo and crates.io.

In a typical C/C++ project the build system works by taking each source file (*.c *.cc) and executing a compiler process to convert that source file into object code. This is inherently scalable to very large amounts of CPU cores – a big C++ project will happily run hundreds of gcc processes at once. The only real scalability limit is RAM, because you can shard the compilation but you can’t shard ld.

In Rust the unit of compilation isn’t source files, it’s libraries (“crates”) – you run one rustc per library, it spits out metadata and object code, and if you change one source file then you have to recompile the entire library. This implies that the natural translation from C++ to Rust would have each Rust project consist of dozens or hundreds of small libraries, and indeed if you structure a Rust project that way then the compilation is as fast or faster than C.

However, the Rust project’s official build system (Cargo) and official package registry (crates.io) are in practice incompatible with this approach:

Cargo projects map 1:1 with libraries. Each Cargo.toml describes a set of source files to compile together; if you want to have your big project split into hundreds of compilation units then you’ll need a hundred Cargo.toml files. Unlike C/C++ build systems there’s generally no wildcarding, you can’t say “generate a virtual Cargo.toml for each *.rs in this directory”, so the practical upper limit is a structure like that of Cranelift.
Cargo has a thing called “features”, where libraries higher up in the dependency tree can alter the compilation settings of their dependencies. This inhibits reuse of compiled outputs – in C you can have Bazel (etc) share object code outputs of low-level libraries regardless of which higher-level targets are being built, but Cargo features turn the tree upside-down and make lower-level targets depend on the definition of higher-level targets. This also means that a per-user object code cache (ala Bazel’s --disk-cache) isn’t of much use, the cache hit rate for Cargo is too low.
crates.io requires a separate package registration for each library. If you have a C project like GLib with hundreds of compilation units in it, it’s still just a single package and single tarball. Cargo and crates.io do not work like this – using Cranelift as an example again, each Cargo.toml corresponds to a separate crates.io package, which in principle has its own version, independent dependency sets, maintainers, and so on. The burden of maintaining a multi-library package on crates.io is overwhelmingly larger than the C/C++ equivalent.
Cargo doesn’t support remote build execution, so larger projects are limited to the hardware of a single machine. That’s why running Cargo in CI sucks so much, you can’t just point your build at a swarm of beefy build machines.
Cargo doesn’t support downloading dependencies via HTTP(S) except via a registry – non-registry dependencies must be obtained via Git (sloooow!) or pre-fetched and placed at some file path hardcoded into the Cargo.toml.

I think if the Bazel/Buck/Pants approach was more visible to Rust maintainers, and they got to experience what it’s like to build a truly large project in ~10s and run the full test suite in ~30s, then they would take a step back and look at Cargo and ask themselves “what the hell are we doing here?”.

skade

I think if the Bazel/Buck/Pants approach was more visible to Rust maintainers, and they got to experience what it’s like to build a truly large project in ~10s and run the full test suite in ~30s, then they would take a step back and look at Cargo and ask themselves “what the hell are we doing here?”.

If it helps, I was in a session at the Rust all-hands that was all about that.

Note that Rusts maintainership is very diverse, so while many maintainers don’t care, those who do have an appropriate meeting spot.
- yawaramin
  
  And what was the conclusion of the session?
  - skade
    
    None, really, all-hands sessions are mainly informative and serve as issue-sharing spaces, as many other maintainers are not around.
- pnathan
  
  I found cargo to be effectively dysfunctional far too early in my project (think maybe 10-15kLoC of rust today)- as I swapped branches, I constantly was recompiling dependencies that weren’t changing.
  
  Moving to Bazel ended those issues, at the cost of a little bit of constant time slowdown that can probably be shrunk. And I can reliably build fastish on remote executors too.
  
  I have a surprising amount of antipathy for the internals of cargo, despite a great fondness for the interface. A replacement that hewed to the interface (cargo.toml) but cached and rebuilt correctly would be materially better.
  
  In lieu of that, I get better at Bazel.
  - seachub
    
    Did you keep the cargo files around in the bazel setup, or go full bazel? vendor everything or no?
    
    pnathan
    
    I maintain working builds with both. I use cargo for package management and vendor all deps. But Bazel is the first class build.
  - rtpg
    
    Is there pain using Bazel with rust libs that aren’t in your source tree? For Python and JS I really wanted to “just” point my existing dependency lists to Bazel but it was either “do some ‘busywork’ to get to the same spot (also rework your CI/etc or maintain two dep lists)” or “just vendor stuff in” (But I could also have just been holding the tool wrong)
    
    pnathan
    
    Bazel takes some work to get going. It’s known to be a … clumsy… tool.
    
    I also have a big belief in vendoring to maintain correct and reliable builds.
    
    That said. The crate universe Bazel tool makes 3p crates just fine- if I manage them with cargo.
    
    If I was trying to hand craft each crates bazel… whew. Not at my scale. Maybe 100x where we could have w Rust Bazel team.
  - ssokolow
    
    *nod* I’m not willing to poke at Bazel yet… but I *do* have projects where I use Just to wrap Cargo for build control so I can set environment variables to manually split my Cargo calls across multiple target directories to keep Cargo from clobbering its caches.
    
    (I don’t like the wasted space, but my time is more valuable than a few extra gigabytes consumed on my 2TB NVMe boot/home ZFS pool.)
- rbetts
  
  In my experience, the developer experience of working in projects split into many libraries is tedious. Many logical changes/features require versioning and publishing (sometimes multiple) new library versions and cascading version numbers through dependencies. Since the library is likely providing an internal-only API, many changes are semver major changes which leads to an awkward publish-library-integrate-API-change workflow that breaks up PRs that would be clearer as atomic changes. You need to manage developer-only/test-only library versioning (since sometimes publishing a library is necessary to test it .. and you obviously don’t want to publish-for-production untested/WIP library changes). Then you need some way to promote or republish or retag your final dev version as a production version and make sure that tag/version is selected by dependencies.
  
  However, I have zero experience with bazel/buck/pants. Is the grass greener over there? What’s that model look like?
- kornel
  
  Enabling of Cargo features causes rebuilding, but that’s a good thing. Build systems that don’t track features give you ABI breakage and need make clean.
  
  Features are enabled declaratively, so they’re known for the leaf dependencies right from the start of the build. Features are additive, so they get unified and the leaf libraries are shared between all their users.
  
  There’s no problem with features. The real problem is with build inputs that Cargo is unable to track accurately, and that’s mainly system dependencies.
  - quasi_qua_quasi
    
    The main issue I run into with features is this: I have two crates in the same workspace, foo and bar. foo transitively depends on baz with a feature enabled. bar transitively depends on it, but doesn’t depend on that feature. Now cargo test winds up recompiling baz every time I switch between foo and bar. Ultimately these are both implementation details of the workspace’s “main” crate, so I’d be fine with saying “enable all features needed by any crate in this workspace”, but there’s no easy way to do that.
    
    kornel
    
    It doesn’t rebuild baz every time. It builds it once for each set of features it has, and keeps multiple copies of baz cached. The transitively set feature may still change behavior of baz and other crates that depend on it, even if you didn’t directly ask for it yourself.
    
    If you tell Cargo to build just one crate, not the whole workspace, it will build the crate with the crate’s own minimal feature set. Without this behavior other workspace crates ended up with features they couldn’t disable, and this kept breaking no-std libraries that didn’t want any std feature enabled anywhere.
    
    Cargo plans to add a setting for it, but in the meantime you can hack around it by making your workspace crates depend on a common crate that enables all these problematic features (tool to hack this).
  - Sanity
    
    So if I have say 10 .rs files in one project, rustc has to compile them sequentially?
    
    fanf
    
    No, they are all read and compiled concurrently, as if you had used nested mod{}s instead of separate files.
    
    kmicklas
    
    No not sequentially, but “together”. You can have cyclic references between all items in a crate so it’s not possible (in general) to topologically sort them and then compile them separately and in parallel like many other languages can do.
    
    kornel
    
    In Rust, macro expansion is serial, and name resolution needs to have a complete view of the crate (due to wildcard imports), but these are relatively minor. The “other languages” have even tougher time optimizing a preprocessor with textual inclusion.
    
    The slow part of compilation is split into parallel codegen units. Items that would prevent splitting are duplicated across units if necessary.
    
    bitwalker
    
    AFAIK, Rust uses a query-based compilation model internally, so when compiling a crate, even with cyclic references, much of the crate can be compiled in parallel. Certain phases might have query dependencies that impede parallelization at that stage of compilation, but the point is that the compiler will attempt to maximize the amount of work being done in parallel when it is possible to do so. It also allows you as the crate author to specifically avoid patterns that introduce a lot of cyclic references, and thereby benefit from more parallelization as a result, at least in theory - I’m not sure what the delta is between theory and practice at the moment concretely, but its not quite so bad as you describe.
    
    Obviously, it isn’t fully parallel, but I’m not sure that is really the case in most other languages either (even if they do better than Rust in this regard), there tends to be at least some amount of information/metadata you need about dependencies on other modules/libraries/whatever.
    
    edk-
    
    You can have cyclic references between C files and still compile them separately, and my relatively uninformed feeling is that one could in principle do this for Rust, with some difficulty.
    
    junon
    
    Cyclic references with crates works. Crate level (inter-dependency) cycles do not.
    
    And cyclical references in C only work if you forward-declare separate prototypes. Rust doesn’t have prototypes (in the literal sense) and doesn’t care in which order items appear.
    
    bradlarsen
    
    While I’d love for Rust build times to be faster, I don’t find it to be that much of a practical problem!
    
    Folks frequently point to C++ as being faster to build, but that’s not been my experience. I’ve worked on several big C++ projects over the years (millions to tens of millions of lines of code), and build times there have been worse than any Rust project I’ve worked on. I’m talking multi-hour parallel release builds.
    
    The Rust project I work most frequently on is Nosey Parker, a detector of hard-coded secrets. It is smaller than the big C++ projects I’ve worked on, but pulls in around 500 crates and a from-scratch release build takes just a couple minutes on a recent laptop. And most of that time is spent building vectorscan-rs-sys, which is a Rust-wrapped C++ codebase.
    
    Again, I’d love for faster build times, but people saying C++ builds faster just doesn’t match my experience.
    
    david_chisnall
    
    I’m talking multi-hour parallel release builds
    
    Computers were a lot slower when I worked on a C++ project that took that long to build. These days a complete clean build of LLVM takes around ten minutes on my laptop.
    
    I suspect the problem is incremental compilation and mismatched expectations for what triggers a recompile. If I modify a .cpp file in LLVM, a rebuild is normally under ten seconds. If I modify a .h file, the incremental build will rebuild between six and three thousand files, but I have a fairly good intuition from the location of the header where that will be.
    
    I would expect that someone moving between Rust and C++ (in either direction) would lack that intuition. A Rust person would probably be shocked at a change to a comment requiring half of the project being recompiled. A C++ person would be surprised at some of the analysis that triggers recompiles in Rust. Once you have worked in either language for a while, I’d expect you’d build good intuitions but until then you probably complain a lot that a thing that you thought was a small change became a huge one.
    
    jlarocco
    
    I’ve never worked on a large Rust project, but IME, C++ build time depends a huge amount on the project structure, the build tools used, and how well the people setting up the build knew those build tools. I’ve seen small projects take a ridiculous amount of time, and full builds so fast I couldn’t believe they were full builds.
    
    I suspect it’s different with Rust where everybody uses the same tool, but it’s hard to generalize about C++.
    
    chrispickard
    
    Again, I’d love for faster build times, but people saying C++ builds faster just doesn’t match my experience.
    
    I agree, maybe for the folks who are working at google or meta with their build infrastructure and tooling but for those of us working with janky ass makefiles… just getting stuff to build is an accomplishment, much less having it build quickly
    
    evmar
    
    It’s funny, just two days ago in an unrelated thread I posted a comment with the rule of thumb: “to get 2x performance you optimize, but to get 10x performance you make architectural changes”. And here is a post about optimizing the Rust compiler with a bunch of local optimizations that shows it getting… just about 2x faster over the last years.
    
    As the post mentions, architectural improvements in a mature project are hard, and in a decentralized codebase might even be so hard as to be impossible. It is disappointing though. I’d much rather be waiting for the 10x than to watch people spend their effort scraping out another few percent via like twiddling compiler flags or whatever, even though I know in my heart the two types of effort aren’t fungible.
    
    (In fairness, I guess there are some improvements underway more like what I’m describing, marked in the post as “northstar”.)
    
    kornel
    
    rustc has been refactored significantly over its lifetime. It had a rewrite of code analysis and generation from AST to IR. Replaced the borrow checker with a completely different architecture (with plans to do it again). Added a const eval interpreter. Moved towards being query-based, and added parallel query execution. Added its own optimizer, and two other codegen backends.
    
    It’s pretty amazing that it started out so long ago from a small project, went through a lot of experimentation, and they didn’t need throw it out and start again. The codebase is fine.
    
    I don’t know if there’s a 10x possible within the compiler, at least I haven’t seen any proposals.
    
    There are significant gains possible from global build caching or shipping prebuilt libraries.
    
    For debug builds linking can be a significant cost. So far Rust preferred to use the system linker for compatibility with the C ecosystem. There could be a 10x speedup there if rustc could bypass linkers entirely and hotpatch code.
    
    kryptiskt
    
    There is also a semi-hard limit on the performance improvement that can be achieved within rustc itself, making 10x performance a pipedream, and that is that a big chunk of the compile time is taken by LLVM. It’s not completely set in stone, since the time LLVM takes depend on what you feed it (for example, Rust used to emit very verbose LLVM IR, which was fine from a performance standpoint, since all the fluff was optimized away but it took unnecessarily long time). So, a much faster Rust compiler would require a new backend too. Cranelift exists, I don’t know if it fits the bill for a 10x compiler.
    
    ekuber
    
    For incremental compiles this is not quite true: rustc already has a query system that can be (easily ab)used to detect which items were actually changed. Once you have that you could generate only the LLVM IR corresponding to the changed items and nothing more, and then rely on an incremental linker or binary patching to generate the resulting final binary. None of this necessitates architectural changes to rustc or cargo, instead needing an incremental linker (I hope Wild can get there) or modify codegen to produce binaries that are easier to patch.
    
    icefox
    
    I wonder how feasible it would be for a small group of people to make a proof of concept for the architectural changes… one that maybe breaks features or produces worse output code, but shows how big a difference the architectural changes make. Could be a very useful way to gather support for a large refactor, both from the compiler devs and the general community.
    
    matklad
    
    I wonder how feasible it would be for a small group of people to make a proof of concept for the architectural changes
    
    See rust-analyzer. Pretty hard. An attempt to avoid making a second compiler ended up creating a second compiler.
    
    icefox
    
    RIP.
    
    bitwalker
    
    … one that maybe breaks features
    
    I think that’s the crux of it. The current architecture is somewhat a consequence of the language design itself (especially with regard to editions, features, and stability/backwards-compatibility in general). That stuff imposes a non-trivial cost IMO, and I suspect it would be hard to do major re-architecting of the compiler that produced significantly better results without making changes to some of those details as well. But that’s just my intuition, maybe there is lower hanging fruit than that, but I would think there is little of that left at this point.
    
    Still, I think it would be an interesting effort to undertake, but I’m not surprised nobody is chomping at the bit to do it. I’d think most of the people interested and capable in doing it, would be inclined to just build a new “better Rust” language and toolchain, but who knows.
    
    schneems
    
    I’m using rust as a general purpose tool (think competes with python/ruby). And for my codebases it’s fine? I have bacon to run my tests on save and it’s basically indistinguishable from a ruby rspec loop. Or faster even as tests run in true parallel. Release compile time on cold cache takes about as long as a large, mature Rails app to boot.
    
    Disk size bloat of the target directory across N different projects is a different story. It’s my second goto for cleanup after “docker prune” when I’m running low.
    
    jcelerier
    
    and it’s basically indistinguishable from a ruby rspec loop.
    
    my experience with ruby is that it’s anything but fast compared to an optimized c++ build pipeline where I can go from code edit to app running in <1 s
    
    rsanheim
    
    How difficult has bacon been for you to get setup and working for your ideal “fast test feedback” workflow? Any configuration hassle or is it “install and done”?
    
    I ask because I’m building a test runner to replace the myriad old ruby tools I’ve used forever (guard, parallel-specs, turbo-tests), and have been researching alternative tools for inspiration or things to avoid :).
    
    schneems
    
    Set it and forget it. It’s super easy. I tried things like guard and guard rspec in Ruby but it’s just too flexible and either it misses some modification and doesn’t run or is overly sensitive and cycles due to something silly like a log file being updated.
    
    There are some more stable file update APIs that have stabilized. Iirc rails has a file watch gem in it (or used to) so that should help.
    
    Also a tip from bacon: by default it ignores all my slow integration tests and either I manually run them or just let CI do it. Document how to split tests like this with your tool or it will be unusable if it runs all tests all the time.
    
    rplacy
    
    same use case here, the envy comes when I rebuild it in go and compile it there :(
    
    rjzak
    
    A “slow” compiler is a fine trade-off for a fast binary and the various Rust benefits. On Linux, the Mold linker could help improve the linking stage.
    
    ssokolow
    
    It depends. On my Ryzen 5 7600 (with a single-tower Noctua cooler that allows all cores to boost from 3.8GHz to 4.9GHz and stay there… both were a compromise between what I could afford and what was attainable at 65W TDP), the rebuild times aren’t terrible… but half the time taken for them is in Mold and the other half is when the build parallelism collapses down to rustc’s frontend pegging just one core to 5.2GHz to compile the crate that gets rebuilt the most.
    
    That wouldn’t have been appreciably helped by spending twice as much for a CPU that packs 12 cores into 65W TDP instead of 6 and 0.1-0.2GHz higher max boost clock when it’s already spending single-digit seconds in Mold.