Exponential growth continued — cargo-semver-checks 2025 Year in Review
8 points by predrag
8 points by predrag
In addition to recapping 2025, the post proposes a set of goals I believe cargo-semver-checks should strive for in the coming years. TL;DR there: "Total number of lints" is a fun number to look at, but increasingly just a vanity metric. I believe the community will be best served by us tackling some of our long-standing (and most difficult) challenges, instead of trying to chase exponential lint growth by itself.
Please see the post where I make a complete case for this, then let me know what you think!
Testing N lints on N test crate pairs results in 3 * N * N total lint executions. 2 * N lints means 2 * N test crate pairs, quadrupling the total lint executions in the test suite to 3 * (2 * N) * (2 * N).
What's the most expensive operation when running a lint? Naively, I'd have assumed that it would be extracting the rustdoc JSON and building any indexes over it (I'm assuming the trustfall query engine does build indexes). In that case, up-front work would scale as O(N A log(N A)) where A is the number of APIs per crate and N is the number of crates, and per-lint work would scale as O(L N A log(N A)) where L is the number of lints if every lint potentially applies to every API. I'd assume that most lints don't apply to most APIs...
Obviously I must be missing one or more key things, so curious as to where this naive math goes wrong.
Thanks for the question!
Building rustdoc JSON is definitely extremely expensive. This is why we build the JSON outside the test loop (via a script that the user must invoke when test crates change) and then cache it long-term, so it essentially vanishes from the equation when considering iteration cost.
We build indexes when loading the rustdoc JSON. The indexes are hashtables not trees, and are per-crate, so O(A) time per crate for O(N * A) total, without any log factors.
L is the number of lints if every lint potentially applies to every API. I'd assume that most lints don't apply to most APIs.
The key observation is that we try to make the tests maximally good as tests first and as fast as possible second, not vice versa. We have a rough time budget for per-test iteration, and as long as we're below that time budget, we'd prefer higher test quality over faster tests.
In particular, we explicitly try not to be too clever about pruning believed-to-be-unnecessary tests. Instead, even in the long run we plan to run all lints on all crates for as long as that remains practical.
This has actually saved our bacon on multiple occasions. You see, we have a rule that each bit of breakage should only be reported by one lint, which is much harder to do in practice than it sounds like at first. For example: if a pub struct with pub fields gets deleted from the public API, you should get a lint for the struct but should not also get lints for each pub field of that struct — even though they are obviously gone too. But if we said "no point in running the fields-related lint on the crate for structs getting deleted" then we'd never notice the buggy duplicate lint. We've actually had bugs like this! We prefer the O(N^2) scaling that catches bugs over O(N) scaling that lets users catch them for us :)
So for our purposes, L = N, and updating O(L N A log(N A)) to eliminate the log (as mentioned above) we get it's actually O(N^2 A) — the same as the blog post claims, except the "API size" is explicitly mentioned instead of being omitted as being essentially constant (which is reasonable because almost all test crates are smallish).
The optimizations mentioned in the blog post were Pareto improvements: they let us continue to run the entire test suite, but run it more efficiently. That's ideal! While that O(N^2) scaling won't work forever, we want to stay moving to Pareto-superior points for as long as possible before we ever consider moving along the Pareto frontier toward faster tests at the expense of test thoroughness.