Making the case that Cargo features could be improved to alleviate Rust compile times

6 points by dryya

jmillikin

Cargo features are fundamentally incompatible with the DAG-style compilation graphs needed for efficient build caching, and is a big part of the reason why Cargo builds so are so incredibly slow compared to other build systems (e.g. Bazel) despite using the same rustc. You just can't get fast builds if a dependent library can alter the compilation inputs of its dependencies, transitively.

And when you start investigating why Rust codebases use Cargo features in the first place, they often exist to work around the crates.io limitation of one library per package. Instead of a foo package containing foo and foo_alloc and foo_std libraries, you get a foo package with alloc and std features, which per above mess with the caching.

osa1

Cargo features are fundamentally incompatible with the DAG-style compilation graphs ... You just can't get fast builds if a dependent library can alter the compilation inputs of its dependencies, transitively

I don't understand this. How can features change DAG-ness of the dependencies or affect caching?

When you cache you cache a version of a package with a given set of features. The features don't change on their own, as long as you don't change them you can reuse the cache. How can features make caching more difficult or change the dependency DAG?
- untitaker
  
  it's elaborated here https://lobste.rs/c/49b29s
  - osa1
    
    I understand C, C++, Rust compilation models and I just read the linked comment just now, I still don't understand what's intrinsic to Cargo features that makes efficient compilation (and/or reusing/caching compilation artifacts) more difficult.
    
    There's no difference between
    
    Depending on foo-1.2.3 (in a build system without features)
    
    Depending on foo-1.2.3 with features A, B, C (maybe the default features)
    
    Depending on foo-1.2.3 with features X, Y, Z (non-default features)
    
    In terms of the DAG shape between the dependent and dependency crates, or how a compilation artifact cache system would work.
    
    jmillikin
    
    The edge from a dependency to its dependee is a forward edge, providing compilation outputs.
    
    Cargo.features introduce a backward edge from the dependee to the dependency, providing the feature set.
    
    This eliminates the "acyclic" property of a directed acyclic graph.
    
    Thus Cargo.can't compute the rustc flags for a library without knowing what its feature set will be.
    
    untitaker
    
    Cargo.features introduce a backward edge from the dependee to the dependency, providing the feature set.
    
    I don't think this is a real edge because it would imply that the set of features required form the dependency are part of the dependee's compilation output. But it's just metadata.
    
    jmillikin
    
    Compiling the object code for the dependency requires computing the rustc command line, including --cfg flags, which can only be computed by examining the features that the dependees are configured with.
    
    An example might help. If you have somebin/Cargo.toml that depends on somelib { features = [] } then cargo build will compile them both. If you edit somebin/Cargo.toml to contain somelib { features = ["std"] } then cargo build will re-compile somelib, even though nothing in the somelib source code changed.
    
    Similarly, if somebin sets somelib { features = [] } and you add a dependency on otherlib that itself depends on somelib { features = ["std"] } then cargo build will re-build somelib.
    
    This is different from a DAG-based build system such as Bazel. The Cargo graph is cyclic (somebin depends on somelib.o which depends on somebin/Cargo.toml).
    
    untitaker
    
    An example might help. If you have somebin/Cargo.toml that depends on somelib { features = [] } then cargo build will compile them both. If you edit somebin/Cargo.toml to contain somelib { features = ["std"] } then cargo build will re-compile somelib, even though nothing in the somelib source code changed.
    
    You are modelling the metadata of a crate and its own sourcecode to be compiled as a single node. That's not necessary as nothing in the metadata depends on the source being compiled, and so there is not necessarily a cycle.
    
    I think I agree that features (and specifically feature unification) are kind of at the root of why builds are slow but I don't think it changes the DAG-ness all too much.
    
    jmillikin
    
    Here's a Graphviz diagram of a simple library + binary build graph: https://tinyurl.com/5n99vbsd (via https://dreampuf.github.io/GraphvizOnline/)
    
    digraph { somelib_toml [label="somelib/Cargo.toml"] somelib_rs [label="somelib/somelib.rs"] somelib_o [label="somelib/somelib.o"] somelib_features [shape=box, label="features"] somelib_rustc [shape=diamond, label="rustc"] somebin_toml [label="somebin/Cargo.toml"] somebin_rs [label="somebin/somebin.rs"] somebin_o [label="somebin/somebin.o"] somebin_rustc [shape=diamond, label="rustc"] somebin_features [shape=box, label="features"] ld [shape=diamond] subgraph cluster_lib { label = "somelib" somelib_toml -> somelib_features somelib_rs -> somelib_rustc -> somelib_o somelib_features -> somelib_rustc } subgraph cluster_bin { label = "somebin" somebin_toml -> somelib_features [color=red] somebin_toml -> somebin_features somebin_rs -> somebin_rustc -> somebin_o somebin_features -> somebin_rustc somelib_o -> ld [color=red] somebin_o -> ld ld -> somebin } }
    
    The cycle is highlighted in the two red lines. There's a forward edge from somelib_o -> ld -> somebin, and a backward edge from somebin_toml to somelib_features.
    
    In this graph there is no way to compute the contents of somelib.o without first computing the set of backward edges from consumers of somelib.o. If you remove somebin then the inputs to rustc -o somebin.o change, which would not be possible in a DAG-based build system.
    
    majaha
    
    The edge from a dependency to its dependee...
    
    Did you mean from a depender to a dependee? In usual programming parlance, "dependency" and "dependee" are synonymous. (But it would also be acceptable to refer to the dependant relationship itself as a "dependency", I suppose.)
    
    osa1
    
    Why do you provide the list of available features to the dependee? If the flags are not right you'll get a build error anyway when building the dependency.
    
    How is the list of features of a crate different than the set of crate versions available? Why are these two information not distributed/communicated to the tools the same way?
    
    jmillikin
    
    Why do you provide the list of available features to the dependee? If the flags are not right you'll get a build error anyway when building the dependency.
    
    You have it backwards. Cargo allows a dependency's features to be configured in the dependee. The flow of feature flags is opposite the flow of compilation outputs.
    
    somelib ---> somelib.o ---> somebin ^ v | | \______ somelib features ___/
    
    How is the list of features of a crate different than the set of crate versions available? Why are these two information not distributed/communicated to the tools the same way?
    
    I'm not sure what this means. The set of crate versions available is dynamic and requires querying the registry (crates.io), so it's not part of the build graph. Compiling a Rust library requires its --cfg flags (if any) to be known, which means computing the feature set from dependees is part of the build graph.
    
    untitaker
    
    Comment removed by author
    
    pointlessone
    
    Cargo features are fundamentally incompatible with the DAG-style compilation graphs needed for efficient build caching
    
    Could you please elaborate on this?
    
    I feel like this is not true. For instance, Gentoo Portage (and BSD ports) has quite complex flags and dependencies system. It can resolve a build plan and often it doesn’t take much time. Once it’s resolved it doesn’t seem like it’s hard to cache it. What’s so incompatible in cargo features?
    
    jmillikin
    
    Answered in a separate sub-thread, but the back-edge introduced by Cargo features means its build graphs aren't acyclic.
    
    I'm not deeply familiar with BSD's ports and the last time I used Gentoo was ~20 years ago, but IIRC they don't try to cache at the same level of granularity. If two packages have vendored the same version of some dependency then that dep will be compiled twice (absent packager-written patches to use a shared dep package).
    
    ssokolow
    
    When JavaScript is disabled, the text is dark grey on black. Not a good first impression.