Making the case that Cargo features could be improved to alleviate Rust compile times
6 points by dryya
6 points by dryya
Cargo features are fundamentally incompatible with the DAG-style compilation graphs needed for efficient build caching, and is a big part of the reason why Cargo builds so are so incredibly slow compared to other build systems (e.g. Bazel) despite using the same rustc. You just can't get fast builds if a dependent library can alter the compilation inputs of its dependencies, transitively.
And when you start investigating why Rust codebases use Cargo features in the first place, they often exist to work around the crates.io limitation of one library per package. Instead of a foo package containing foo and foo_alloc and foo_std libraries, you get a foo package with alloc and std features, which per above mess with the caching.
Cargo features are fundamentally incompatible with the DAG-style compilation graphs ... You just can't get fast builds if a dependent library can alter the compilation inputs of its dependencies, transitively
I don't understand this. How can features change DAG-ness of the dependencies or affect caching?
When you cache you cache a version of a package with a given set of features. The features don't change on their own, as long as you don't change them you can reuse the cache. How can features make caching more difficult or change the dependency DAG?
it's elaborated here https://lobste.rs/c/49b29s
I understand C, C++, Rust compilation models and I just read the linked comment just now, I still don't understand what's intrinsic to Cargo features that makes efficient compilation (and/or reusing/caching compilation artifacts) more difficult.
There's no difference between
In terms of the DAG shape between the dependent and dependency crates, or how a compilation artifact cache system would work.
The edge from a dependency to its dependee is a forward edge, providing compilation outputs.
Cargo.features introduce a backward edge from the dependee to the dependency, providing the feature set.
This eliminates the "acyclic" property of a directed acyclic graph.
Thus Cargo.can't compute the rustc flags for a library without knowing what its feature set will be.
Cargo.features introduce a backward edge from the dependee to the dependency, providing the feature set.
I don't think this is a real edge because it would imply that the set of features required form the dependency are part of the dependee's compilation output. But it's just metadata.
Compiling the object code for the dependency requires computing the rustc command line, including --cfg flags, which can only be computed by examining the features that the dependees are configured with.
An example might help. If you have somebin/Cargo.toml that depends on somelib { features = [] } then cargo build will compile them both. If you edit somebin/Cargo.toml to contain somelib { features = ["std"] } then cargo build will re-compile somelib, even though nothing in the somelib source code changed.
Similarly, if somebin sets somelib { features = [] } and you add a dependency on otherlib that itself depends on somelib { features = ["std"] } then cargo build will re-build somelib.
This is different from a DAG-based build system such as Bazel. The Cargo graph is cyclic (somebin depends on somelib.o which depends on somebin/Cargo.toml).
An example might help. If you have somebin/Cargo.toml that depends on somelib { features = [] } then cargo build will compile them both. If you edit somebin/Cargo.toml to contain somelib { features = ["std"] } then cargo build will re-compile somelib, even though nothing in the somelib source code changed.
You are modelling the metadata of a crate and its own sourcecode to be compiled as a single node. That's not necessary as nothing in the metadata depends on the source being compiled, and so there is not necessarily a cycle.
I think I agree that features (and specifically feature unification) are kind of at the root of why builds are slow but I don't think it changes the DAG-ness all too much.
Here's a Graphviz diagram of a simple library + binary build graph: https://tinyurl.com/5n99vbsd (via https://dreampuf.github.io/GraphvizOnline/)
digraph {
somelib_toml [label="somelib/Cargo.toml"]
somelib_rs [label="somelib/somelib.rs"]
somelib_o [label="somelib/somelib.o"]
somelib_features [shape=box, label="features"]
somelib_rustc [shape=diamond, label="rustc"]
somebin_toml [label="somebin/Cargo.toml"]
somebin_rs [label="somebin/somebin.rs"]
somebin_o [label="somebin/somebin.o"]
somebin_rustc [shape=diamond, label="rustc"]
somebin_features [shape=box, label="features"]
ld [shape=diamond]
subgraph cluster_lib {
label = "somelib"
somelib_toml -> somelib_features
somelib_rs -> somelib_rustc -> somelib_o
somelib_features -> somelib_rustc
}
subgraph cluster_bin {
label = "somebin"
somebin_toml -> somelib_features [color=red]
somebin_toml -> somebin_features
somebin_rs -> somebin_rustc -> somebin_o
somebin_features -> somebin_rustc
somelib_o -> ld [color=red]
somebin_o -> ld
ld -> somebin
}
}
The cycle is highlighted in the two red lines. There's a forward edge from somelib_o -> ld -> somebin, and a backward edge from somebin_toml to somelib_features.
In this graph there is no way to compute the contents of somelib.o without first computing the set of backward edges from consumers of somelib.o. If you remove somebin then the inputs to rustc -o somebin.o change, which would not be possible in a DAG-based build system.
The edge from a dependency to its dependee...
Did you mean from a depender to a dependee? In usual programming parlance, "dependency" and "dependee" are synonymous. (But it would also be acceptable to refer to the dependant relationship itself as a "dependency", I suppose.)
Why do you provide the list of available features to the dependee? If the flags are not right you'll get a build error anyway when building the dependency.
How is the list of features of a crate different than the set of crate versions available? Why are these two information not distributed/communicated to the tools the same way?
Why do you provide the list of available features to the dependee? If the flags are not right you'll get a build error anyway when building the dependency.
You have it backwards. Cargo allows a dependency's features to be configured in the dependee. The flow of feature flags is opposite the flow of compilation outputs.
somelib ---> somelib.o ---> somebin
^ v
| |
\______ somelib features ___/
How is the list of features of a crate different than the set of crate versions available? Why are these two information not distributed/communicated to the tools the same way?
I'm not sure what this means. The set of crate versions available is dynamic and requires querying the registry (crates.io), so it's not part of the build graph. Compiling a Rust library requires its --cfg flags (if any) to be known, which means computing the feature set from dependees is part of the build graph.
Cargo features are fundamentally incompatible with the DAG-style compilation graphs needed for efficient build caching
Could you please elaborate on this?
I feel like this is not true. For instance, Gentoo Portage (and BSD ports) has quite complex flags and dependencies system. It can resolve a build plan and often it doesn’t take much time. Once it’s resolved it doesn’t seem like it’s hard to cache it. What’s so incompatible in cargo features?
Answered in a separate sub-thread, but the back-edge introduced by Cargo features means its build graphs aren't acyclic.
I'm not deeply familiar with BSD's ports and the last time I used Gentoo was ~20 years ago, but IIRC they don't try to cache at the same level of granularity. If two packages have vendored the same version of some dependency then that dep will be compiled twice (absent packager-written patches to use a shared dep package).