Why doesn’t Rust care more about compiler performance?
59 points by ajdecon
59 points by ajdecon
The article touches on this very briefly, but IMO a lot of the blame for Rust’s reputation for slow compilation rests on the shoulders of Cargo and crates.io.
In a typical C/C++ project the build system works by taking each source file (*.c *.cc
) and executing a compiler process to convert that source file into object code. This is inherently scalable to very large amounts of CPU cores – a big C++ project will happily run hundreds of gcc
processes at once. The only real scalability limit is RAM, because you can shard the compilation but you can’t shard ld
.
In Rust the unit of compilation isn’t source files, it’s libraries (“crates”) – you run one rustc
per library, it spits out metadata and object code, and if you change one source file then you have to recompile the entire library. This implies that the natural translation from C++ to Rust would have each Rust project consist of dozens or hundreds of small libraries, and indeed if you structure a Rust project that way then the compilation is as fast or faster than C.
However, the Rust project’s official build system (Cargo) and official package registry (crates.io) are in practice incompatible with this approach:
Cargo.toml
describes a set of source files to compile together; if you want to have your big project split into hundreds of compilation units then you’ll need a hundred Cargo.toml
files. Unlike C/C++ build systems there’s generally no wildcarding, you can’t say “generate a virtual Cargo.toml
for each *.rs
in this directory”, so the practical upper limit is a structure like that of Cranelift.--disk-cache
) isn’t of much use, the cache hit rate for Cargo is too low.Cargo.toml
corresponds to a separate crates.io package, which in principle has its own version, independent dependency sets, maintainers, and so on. The burden of maintaining a multi-library package on crates.io is overwhelmingly larger than the C/C++ equivalent.Cargo.toml
.I think if the Bazel/Buck/Pants approach was more visible to Rust maintainers, and they got to experience what it’s like to build a truly large project in ~10s and run the full test suite in ~30s, then they would take a step back and look at Cargo and ask themselves “what the hell are we doing here?”.
I think if the Bazel/Buck/Pants approach was more visible to Rust maintainers, and they got to experience what it’s like to build a truly large project in ~10s and run the full test suite in ~30s, then they would take a step back and look at Cargo and ask themselves “what the hell are we doing here?”.
If it helps, I was in a session at the Rust all-hands that was all about that.
Note that Rusts maintainership is very diverse, so while many maintainers don’t care, those who do have an appropriate meeting spot.
I found cargo to be effectively dysfunctional far too early in my project (think maybe 10-15kLoC of rust today)- as I swapped branches, I constantly was recompiling dependencies that weren’t changing.
Moving to Bazel ended those issues, at the cost of a little bit of constant time slowdown that can probably be shrunk. And I can reliably build fastish on remote executors too.
I have a surprising amount of antipathy for the internals of cargo, despite a great fondness for the interface. A replacement that hewed to the interface (cargo.toml) but cached and rebuilt correctly would be materially better.
In lieu of that, I get better at Bazel.
Is there pain using Bazel with rust libs that aren’t in your source tree? For Python and JS I really wanted to “just” point my existing dependency lists to Bazel but it was either “do some ‘busywork’ to get to the same spot (also rework your CI/etc or maintain two dep lists)” or “just vendor stuff in” (But I could also have just been holding the tool wrong)
Bazel takes some work to get going. It’s known to be a … clumsy… tool.
I also have a big belief in vendoring to maintain correct and reliable builds.
That said. The crate universe Bazel tool makes 3p crates just fine- if I manage them with cargo.
If I was trying to hand craft each crates bazel… whew. Not at my scale. Maybe 100x where we could have w Rust Bazel team.
*nod* I’m not willing to poke at Bazel yet… but I *do* have projects where I use Just to wrap Cargo for build control so I can set environment variables to manually split my Cargo calls across multiple target directories to keep Cargo from clobbering its caches.
(I don’t like the wasted space, but my time is more valuable than a few extra gigabytes consumed on my 2TB NVMe boot/home ZFS pool.)
In my experience, the developer experience of working in projects split into many libraries is tedious. Many logical changes/features require versioning and publishing (sometimes multiple) new library versions and cascading version numbers through dependencies. Since the library is likely providing an internal-only API, many changes are semver major changes which leads to an awkward publish-library-integrate-API-change workflow that breaks up PRs that would be clearer as atomic changes. You need to manage developer-only/test-only library versioning (since sometimes publishing a library is necessary to test it .. and you obviously don’t want to publish-for-production untested/WIP library changes). Then you need some way to promote or republish or retag your final dev version as a production version and make sure that tag/version is selected by dependencies.
However, I have zero experience with bazel/buck/pants. Is the grass greener over there? What’s that model look like?
Enabling of Cargo features causes rebuilding, but that’s a good thing. Build systems that don’t track features give you ABI breakage and need make clean
.
Features are enabled declaratively, so they’re known for the leaf dependencies right from the start of the build. Features are additive, so they get unified and the leaf libraries are shared between all their users.
There’s no problem with features. The real problem is with build inputs that Cargo is unable to track accurately, and that’s mainly system dependencies.
The main issue I run into with features is this: I have two crates in the same workspace, foo and bar. foo transitively depends on baz with a feature enabled. bar transitively depends on it, but doesn’t depend on that feature. Now cargo test
winds up recompiling baz every time I switch between foo and bar. Ultimately these are both implementation details of the workspace’s “main” crate, so I’d be fine with saying “enable all features needed by any crate in this workspace”, but there’s no easy way to do that.
It doesn’t rebuild baz
every time. It builds it once for each set of features it has, and keeps multiple copies of baz
cached. The transitively set feature may still change behavior of baz
and other crates that depend on it, even if you didn’t directly ask for it yourself.
If you tell Cargo to build just one crate, not the whole workspace, it will build the crate with the crate’s own minimal feature set. Without this behavior other workspace crates ended up with features they couldn’t disable, and this kept breaking no-std
libraries that didn’t want any std
feature enabled anywhere.
Cargo plans to add a setting for it, but in the meantime you can hack around it by making your workspace crates depend on a common crate that enables all these problematic features (tool to hack this).
So if I have say 10 .rs files in one project, rustc has to compile them sequentially?
No, they are all read and compiled concurrently, as if you had used nested mod{}s instead of separate files.
No not sequentially, but “together”. You can have cyclic references between all items in a crate so it’s not possible (in general) to topologically sort them and then compile them separately and in parallel like many other languages can do.
In Rust, macro expansion is serial, and name resolution needs to have a complete view of the crate (due to wildcard imports), but these are relatively minor. The “other languages” have even tougher time optimizing a preprocessor with textual inclusion.
The slow part of compilation is split into parallel codegen units. Items that would prevent splitting are duplicated across units if necessary.
AFAIK, Rust uses a query-based compilation model internally, so when compiling a crate, even with cyclic references, much of the crate can be compiled in parallel. Certain phases might have query dependencies that impede parallelization at that stage of compilation, but the point is that the compiler will attempt to maximize the amount of work being done in parallel when it is possible to do so. It also allows you as the crate author to specifically avoid patterns that introduce a lot of cyclic references, and thereby benefit from more parallelization as a result, at least in theory - I’m not sure what the delta is between theory and practice at the moment concretely, but its not quite so bad as you describe.
Obviously, it isn’t fully parallel, but I’m not sure that is really the case in most other languages either (even if they do better than Rust in this regard), there tends to be at least some amount of information/metadata you need about dependencies on other modules/libraries/whatever.
You can have cyclic references between C files and still compile them separately, and my relatively uninformed feeling is that one could in principle do this for Rust, with some difficulty.
Cyclic references with crates works. Crate level (inter-dependency) cycles do not.
And cyclical references in C only work if you forward-declare separate prototypes. Rust doesn’t have prototypes (in the literal sense) and doesn’t care in which order items appear.
While I’d love for Rust build times to be faster, I don’t find it to be that much of a practical problem!
Folks frequently point to C++ as being faster to build, but that’s not been my experience. I’ve worked on several big C++ projects over the years (millions to tens of millions of lines of code), and build times there have been worse than any Rust project I’ve worked on. I’m talking multi-hour parallel release builds.
The Rust project I work most frequently on is Nosey Parker, a detector of hard-coded secrets. It is smaller than the big C++ projects I’ve worked on, but pulls in around 500 crates and a from-scratch release build takes just a couple minutes on a recent laptop. And most of that time is spent building vectorscan-rs-sys, which is a Rust-wrapped C++ codebase.
Again, I’d love for faster build times, but people saying C++ builds faster just doesn’t match my experience.
I’m talking multi-hour parallel release builds
Computers were a lot slower when I worked on a C++ project that took that long to build. These days a complete clean build of LLVM takes around ten minutes on my laptop.
I suspect the problem is incremental compilation and mismatched expectations for what triggers a recompile. If I modify a .cpp file in LLVM, a rebuild is normally under ten seconds. If I modify a .h file, the incremental build will rebuild between six and three thousand files, but I have a fairly good intuition from the location of the header where that will be.
I would expect that someone moving between Rust and C++ (in either direction) would lack that intuition. A Rust person would probably be shocked at a change to a comment requiring half of the project being recompiled. A C++ person would be surprised at some of the analysis that triggers recompiles in Rust. Once you have worked in either language for a while, I’d expect you’d build good intuitions but until then you probably complain a lot that a thing that you thought was a small change became a huge one.
I’ve never worked on a large Rust project, but IME, C++ build time depends a huge amount on the project structure, the build tools used, and how well the people setting up the build knew those build tools. I’ve seen small projects take a ridiculous amount of time, and full builds so fast I couldn’t believe they were full builds.
I suspect it’s different with Rust where everybody uses the same tool, but it’s hard to generalize about C++.
Again, I’d love for faster build times, but people saying C++ builds faster just doesn’t match my experience.
I agree, maybe for the folks who are working at google or meta with their build infrastructure and tooling but for those of us working with janky ass makefiles… just getting stuff to build is an accomplishment, much less having it build quickly
It’s funny, just two days ago in an unrelated thread I posted a comment with the rule of thumb: “to get 2x performance you optimize, but to get 10x performance you make architectural changes”. And here is a post about optimizing the Rust compiler with a bunch of local optimizations that shows it getting… just about 2x faster over the last years.
As the post mentions, architectural improvements in a mature project are hard, and in a decentralized codebase might even be so hard as to be impossible. It is disappointing though. I’d much rather be waiting for the 10x than to watch people spend their effort scraping out another few percent via like twiddling compiler flags or whatever, even though I know in my heart the two types of effort aren’t fungible.
(In fairness, I guess there are some improvements underway more like what I’m describing, marked in the post as “northstar”.)
rustc
has been refactored significantly over its lifetime. It had a rewrite of code analysis and generation from AST to IR. Replaced the borrow checker with a completely different architecture (with plans to do it again). Added a const eval interpreter. Moved towards being query-based, and added parallel query execution. Added its own optimizer, and two other codegen backends.
It’s pretty amazing that it started out so long ago from a small project, went through a lot of experimentation, and they didn’t need throw it out and start again. The codebase is fine.
I don’t know if there’s a 10x possible within the compiler, at least I haven’t seen any proposals.
There are significant gains possible from global build caching or shipping prebuilt libraries.
For debug builds linking can be a significant cost. So far Rust preferred to use the system linker for compatibility with the C ecosystem. There could be a 10x speedup there if rustc could bypass linkers entirely and hotpatch code.
There is also a semi-hard limit on the performance improvement that can be achieved within rustc itself, making 10x performance a pipedream, and that is that a big chunk of the compile time is taken by LLVM. It’s not completely set in stone, since the time LLVM takes depend on what you feed it (for example, Rust used to emit very verbose LLVM IR, which was fine from a performance standpoint, since all the fluff was optimized away but it took unnecessarily long time). So, a much faster Rust compiler would require a new backend too. Cranelift exists, I don’t know if it fits the bill for a 10x compiler.
For incremental compiles this is not quite true: rustc already has a query system that can be (easily ab)used to detect which items were actually changed. Once you have that you could generate only the LLVM IR corresponding to the changed items and nothing more, and then rely on an incremental linker or binary patching to generate the resulting final binary. None of this necessitates architectural changes to rustc or cargo, instead needing an incremental linker (I hope Wild can get there) or modify codegen to produce binaries that are easier to patch.
I wonder how feasible it would be for a small group of people to make a proof of concept for the architectural changes… one that maybe breaks features or produces worse output code, but shows how big a difference the architectural changes make. Could be a very useful way to gather support for a large refactor, both from the compiler devs and the general community.
… one that maybe breaks features
I think that’s the crux of it. The current architecture is somewhat a consequence of the language design itself (especially with regard to editions, features, and stability/backwards-compatibility in general). That stuff imposes a non-trivial cost IMO, and I suspect it would be hard to do major re-architecting of the compiler that produced significantly better results without making changes to some of those details as well. But that’s just my intuition, maybe there is lower hanging fruit than that, but I would think there is little of that left at this point.
Still, I think it would be an interesting effort to undertake, but I’m not surprised nobody is chomping at the bit to do it. I’d think most of the people interested and capable in doing it, would be inclined to just build a new “better Rust” language and toolchain, but who knows.
I’m using rust as a general purpose tool (think competes with python/ruby). And for my codebases it’s fine? I have bacon to run my tests on save and it’s basically indistinguishable from a ruby rspec loop. Or faster even as tests run in true parallel. Release compile time on cold cache takes about as long as a large, mature Rails app to boot.
Disk size bloat of the target directory across N different projects is a different story. It’s my second goto for cleanup after “docker prune” when I’m running low.
and it’s basically indistinguishable from a ruby rspec loop.
my experience with ruby is that it’s anything but fast compared to an optimized c++ build pipeline where I can go from code edit to app running in <1 s
How difficult has bacon been for you to get setup and working for your ideal “fast test feedback” workflow? Any configuration hassle or is it “install and done”?
I ask because I’m building a test runner to replace the myriad old ruby tools I’ve used forever (guard, parallel-specs, turbo-tests), and have been researching alternative tools for inspiration or things to avoid :).
Set it and forget it. It’s super easy. I tried things like guard and guard rspec in Ruby but it’s just too flexible and either it misses some modification and doesn’t run or is overly sensitive and cycles due to something silly like a log file being updated.
There are some more stable file update APIs that have stabilized. Iirc rails has a file watch gem in it (or used to) so that should help.
Also a tip from bacon: by default it ignores all my slow integration tests and either I manually run them or just let CI do it. Document how to split tests like this with your tool or it will be unusable if it runs all tests all the time.
A “slow” compiler is a fine trade-off for a fast binary and the various Rust benefits. On Linux, the Mold linker could help improve the linking stage.
It depends. On my Ryzen 5 7600 (with a single-tower Noctua cooler that allows all cores to boost from 3.8GHz to 4.9GHz and stay there… both were a compromise between what I could afford and what was attainable at 65W TDP), the rebuild times aren’t terrible… but half the time taken for them is in Mold and the other half is when the build parallelism collapses down to rustc’s frontend pegging just one core to 5.2GHz to compile the crate that gets rebuilt the most.
That wouldn’t have been appreciably helped by spending twice as much for a CPU that packs 12 cores into 65W TDP instead of 6 and 0.1-0.2GHz higher max boost clock when it’s already spending single-digit seconds in Mold.