Rust Dependencies scare Me
69 points by vaguelytagged
69 points by vaguelytagged
The solution to having too many dependencies is to get more comfortable writing your own code: https://lucumr.pocoo.org/2025/1/24/build-it-yourself/. You’ve already stumbled upon it with the dotenv
crate, you’ll just have to go through the same decision tree before every cargo add
in the future.
While I agree that Rust can easily lend itself to large dependency trees, I don’t find these rants about Rust dependencies particularly helpful. The problem of “I can’t possibly review all this third-party code” applies to all languages, more so to ones like Java where you generally consume compiled artifacts rather than source code or to Golang where Google is effectively MITM’ing all dependency requests by default and the source you pull can differ from what you saw in the Git repo. I also rarely see mention of cargo-vet or cargo-crev which are trying to work on this very difficult problem using distributed trust.
The problem of “I can’t possibly review all this third-party code” applies to all languages
If you are in the C# ecosystem, many libraries are provided by Microsoft. These libraries (hopefully) adhere to a certain standard of quality and security. So no, the problem is not the same in every language.
And in Rust many libraries are provided by the rust-lang organisation, how does that change anything about every other library that doesn’t come from the stewards of the language?
In theory this is true, but in practice I think the .NET framework is much larger than the rust standard lib + everything owned by rust-lang. Just as an example, the .NET framework has an equivalent for: serde_json, regex, syn, base64, chrono, uuid, tokio, reqwest, rayon, and axum.
I’d like to see the Rust equivalent of golang.org/x/
Nice page, but it literally says “unofficial”.
golang.org/x/ is ALSO unofficial – it is not operated or endorsed by the United States government or any other duly elected sovereign government.
I expect you will protest that governments are not the only organizations that can effectively run things, that having a corporation like Google perform the vetting rather than a national government does not mean that the vetting is meaningless. And you would be correct – in exactly the same way that blessed.rs is run by a different organization than the one that produces the Rust compiler but still provides a valuable service.
golang.org/x/ is ALSO unofficial – it is not operated or endorsed by the United States government or any other duly elected sovereign government.
Weird shift in the goalposts there.
The point here is that the vetting is provided by an organisation you’re already trusting. If you’re using Rust, official vetting would be done by the Rust team.
The whole discussion is about limiting the number of entities you have to trust beyond the ones you already have to trust just by using Rust (edit: couldn’t help myself and had to summarise this; must trust just Rust).
So, as the other commenter said, it’s a nice page, since at least it consolidates third party blessings to one organisation, which you can hopefully trust.
But it’s not equivalent to golang.org/x/, since it’s not by the same team that already made the language.
In cool terms, it improves the algorithm for trust of third party packages that are on the list from trusting O(group(n)) entities to O(1), but not zero-cost.
But it’s not equivalent to golang.org/x/, since it’s not by the same team that already made the language.
That statement assumes that “the same team that already made the language” still works on Rust, and that they sit above all other teams in the org chart.
Even Rust itself is split into different teams for the compiler, standard library, etc.
Isn’t there an authority that appoints people to those teams? You’re not individually trusting the compiler team and the stdlib team and the infra team, or however it’s split up, are you? You trust them all implicitly because you trust the umbrella entity.
My point is that your statement assumes that “the team that already made the language” is the “authority that appoints people to those teams”.
Change that to, “the entity that currently works on the language”, then. It doesn’t really change much.
In the same way you trust the compiler team appointed by the leadership through transitivity, you could choose to trust the creating entity to appoint their successors in various roles.
Or you could choose to reevaluate whether you should trust the umbrella entity responsible for Rust every time a complete transfer occurs.
But that’s a separate discussion. If you’ve decided not to trust the entity currently responsible for the language and stdlib and infra and so on, which the creators appointed as successors, then there’s no reason to even get to the question of whether you can trust the important packages of the ecosystem.
However, if you’ve decided you can trust them in all of that, you can probably also trust their blessing of vetted packages (or even extend stronger trust if they actually work on the codebases of those important packages themselves).
The people already on the teams appoint new people to the respective teams.
Yes, I talked about the risk of transitivity of trust in the appointment of new people. I don’t see how your comment changes anything.
to a certain standard of quality and security
Microsoft? The org that leaks their own root keys on the regular?
Interesting I’m assuming because micosoft dogfoods alot of their own stuff. I wonder if Google will do something similar given their recent agressive adoption.
I guess it is just a little more apparent in Rust than even C / Cpp since they leverage more system libraries… For things like Axum / Tokio I don’t see myself ever re writing that but then again that’s the point and the tradeoff I make. It just seems like ALOT of lines, but maybe I should investigate other languages and really try to see how many lines it takes there (in go std library itself). Just started using cargo vet but it brought up more unchecked dependencies than I expected, and I’m using pretty popular ones. Ones I know for sure Discord, Cloudflare, and even AWS used / are using in production… Maybe I’ll check out crev
since they leverage more system libraries
I get what you mean. But last I checked you don’t use a system lib for the missing batteries in C (HashMap & co). Instead you simply copy code - adding a dependency. Same with C++ and libs like re2 and ctre to get fast regex..
Copy pasting has the issues of you never can get updates or see possible security advisories. I think a more fair future comparison is to try to build a 1:1 server with best practices in cpp and c and try to count lines, functions, crates, packages, headers ect..
Even including things like libc
This doesn’t match my experience.
Admittedly, it depends a LOT on the project and the developer’s opinions on project organization, but most of the C++ and C projects I’ve worked on do use the system libraries. In the open source world it’s the default way of doing it.
The exceptions are libraries we have to modify, and then we make it a submodule, or setup our own apt source and publish our version there for internal use.
But it’s nonsense to fetch and build your own version of every dependency. C++’s package management is primitive, but not that primitive.
Rust doesn’t involve relying on more code-you-didn’t-personally-write than other languages. It just makes the code-you-didn’t-personally-write more visible.
And counting only the lines of code in the Linux kernel isn’t a great comparison, because you rely on more than just the kernel – you rely on an entire running system, you rely on the Rust compiler, you rely on all the runtime and build-time dependencies of the full system and Rust compiler, etc. etc.
If you wrote your project in a language which makes the dependencies more opaque, it probably would not meaningfully change the quantity of code you’re relying on without personally reviewing line-by-line. It would only change whether you know you’re relying on all that code.
Or, more bluntly: no matter what, you are going to rely on at the very least millions of lines of code you have not personally reviewed, and there is no strategy you can adopt which will cause that number to shrink by any significant fraction. All you can do is adopt things which give you better or worse views into what you’re depending on.
This was definitely something I was thinking about after writing. I guess it’s just a little shocking that there’s so much that I rely on for something that I expected to be sorta trivial.
https://wiki.alopex.li/LetsBeRealAboutDependencies goes into more detail on that if you want more detail.
As for being safer, I have five suggestions:
Use https://lib.rs/ instead of crates.io for web-based repo browsing, because it’s focused on making it easier to do first-pass evaluation of potential dependencies. (https://lib.rs/about)
Use cargo-supply-chain to evaluate packages based on how many people you’re trusting rather than how many pieces they cleaved your dependencies into.
Keep an eye on tools like cargo-crev and cargo-vet. (lib.rs incorporates reviews from them via an Audit tab when available.)
Build your code in some kind of sandbox so you’re not the low-hanging fruit for build-time attacks (I really need to get back to working on my Firejail wrapper for that, which I got cold feet on because it would make attacking me to get at others more appealing if I publish it. I haven’t tried or audited any of them, but others have been working on similar concepts like cargo-green.)
Put as much of your code as possible into WebAssembly modules so runtime attacks are constrained by capability-based APIs and you can approach the Bytecode Alliance’s nanoprocess isolation concept.
(I’ve been working https://benw.is/posts/plugins-with-rust-and-wasi into one of my projects for the runtime-installable plugins side of things, but it does also feel good to be able to know that things like svgbob
with not-insignificant trees of transitive dependencies can only operate on explicitly passed function arguments and act by returning data which will then be fed through Ammonia… things which will require Emscripten at best, like Graphviz or Mscgen are much further down the TODO list… though I just discovered layout-rs so dot
diagrams may be joining svgbob soon.)
(It’s called render-wishlist
… originally because it rendered Markdown into the Christmas wishlist my brother asked us all to make but, now, because it’s moving toward becoming an implementation of everything on my wishlist for a Markdown renderer and static site generator… including a rich grammar of shortcodes for depictions of game controller buttons and keyboard keys with symbols like ⌘, ⌥, ⎋, and ↹.)
Something to keep in mind. In C you depend on your libc (glibc is definitely large) but also the compiler, make, etc.
On top of it, you probably end up needing a lot of dependencies or end up doing a lot of this stuff yourself, but badly.
See dependencies as a way to get experts to do part of your code. For free. And to get access to the hive mind of contributors.
Seen like that, these dependency trees look liberating. I do not have to be an expert in every domains to get safe, efficient and fast stuff. I share bug fixing with everyone. If anything, i would trust it more than what I produce.
I maintain a hyperloglog reference implementation for erlang and the float to string implementation. It taught me a lot about the amount of work and expertise needed to implement this stuff at a good enough level. Revel in it.
See dependencies as a way to get experts to do part of your code
This is a nice theory. In practice, dependencies are a way to get other people to do part of your code. Sometimes they’re experts, sometimes they’re muppets. It’s often hard to tell which they are in advance. I’ve used libraries that seem to have three users that are clearly written by incredible programmers and others that are the most popular in their domain and are full of the kinds of mistakes I’d fail a first-year undergrad for making in coursework.
I think people also tend to underestimate how much of the complexity of making a solution comes from making it generalizable. Often you can make a specialized version of exactly what you need instead of pulling in a general-purpose library and have not only less code, but faster and simpler code.
In my experience, that varies a lot across languages. If you have help from the type system, either with generics or dynamic dispatch, the specialised and generic versions end up being almost the same. In a language like C, the generic version is much harder to write.
Oh, that’s true, but I was thinking not necessarily “generic” as in parametric polymorphism, but more along the lines of “in this application, we know there will never be more than $N elements, so we can use a ring buffer instead of a resizable tree” or “we only care about this particular feature, so we don’t need a generic tree traversal + “visitor” but can use a simple recursive descent algorithm” - more specific on the higher-level algorithm choice itself, not just the implementation
in this application, we know there will never be more than $N elements, so we can use a ring buffer instead of a resizable tree
Right, but a language can make it easy to write a ring buffer of N items of type T, or it can make it easy to write one that is 32 items of a specific concrete type. If the language favours the former, you can put that in a library and the next time you need a fixed-size ring buffer, you don’t need to reimplement it, you just instantiate it with the desired size and type.
I’m biased in this example because I’ve implemented the lockless ring buffer design that Tim Harris and Keir Fraser created for Xen in a few languages, but I’ve never needed to do it more than once in anything other than C. Their implementation in C is macro hell and was much harder to write than a version specialised for a single type. Xen actually has a nice demonstration of this: the early-boot console uses the same core data structure but specialised for characters and not using the same implementation as other PV devices and it’s much easier to read (or, at least, was in the Xen 3 days, I haven’t looked for a long time) and was probably much easier to write. Yet in C++, the complexity of the two is almost the same and the templated C++ version is about as complex as the specialised C version (both to read and write).
If you’re in a language like Haskell or Lisp, it’s very easy to create the generic version of a data structure and often as easy as creating a specialised version. This makes it easy to build up a big library of off-the-shelf data structures so you never say ‘I need to implement this data structure’, you just say ‘I need this, specialised for X, Y, Z, what is it called?’.
Sure. But the thing is. There are nearly no other way to get the experts in your code.
So they always are the way to do it. Doesn’t means they are all experts. But you have to start from the pov of dependencies being assets. If you start from the pov that they are problems, then you would nearly never use them and as such lose the benefit.
There are nearly no other way to get the experts in your code.
Well, except hiring experts and paying them to work on your code. Which is what we do with our first-party code.
I highly doubt you are hiring the expert on everything, from cpu microcode to GUI through data structures, compilers, linkers, drivers, cryptography, string and fonts rendering, etc etc.
This whole fiasco led me tho think …. do I even need this crate at all? 35 lines later I had the parts of dotenv I needed. Packages become un-mainted in every language and it was my choice to pull in an arguably trivial dependency.
Now you’ve re-invented the worst-case (header-only libraries or bespoke implementations) of the reason Linux distro maintainers try so hard to un-vendor dependencies when building distro packages and containers are a problem.
What if a bug is discovered? Now being aware that you might be subject to it, identifying if you are, and fixing it are all completely manual.
Among other things, you’ve broken the ability for cargo audit
to surface RUSTSEC advisories and GitHub Dependabot to offer up PRs to fix things. (I’d be so far behind on this sort of thing without Dependabot.)
(In practical terms, given real-world realities, you decided “I’m scared of this whole package becoming unmaintained, so, to jettison the code I never invoke anyway and hedge against the risk of new vulnerabilities being introduced later, I’m going to immediately force-unmaintain the ‘discover exploitable logic bugs’ side of things for the code I do invoke.)
Human value calculus-wise, you’re essentially enacting what I call the C programmer’s fallacy. That oh-so-human tendency to to operate on the principle that “I don’t trust other people’s code, but the code that I wrote/reviewed is flawless” because it’s so damn easy to fool oneself. (In essence, the whole reason the scientific method is so focused on falsifiability and peer review to minimize the chances of another cold fusion or N-rays or polywater.)
Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust. Removing the vendored packages reduces this to 11136 lines of rust.
The link is correct, but you typo’d “tokei” in the visible text.
Also, while I haven’t used cargo vendor
myself, my understanding is that, like Cargo.lock
, it pins down down the version of every package you might need across every conditional compilation switch and platforms-specific dependency.
(eg. Never building something for Windows? Tokei’s still gonna see and count the giant mass of Windows platform API bindings that got vendored so that if someone on your team does audit things, surprise un-audited dependencies won’t creep into a build later just from adding --target
or --features
to your build command. Project won’t build for WebAssembly because WASI doesn’t have a required API yet? You’ll still vendor the WebAssembly support stuff for any portions of your tree that do support it. Using tokio but some of your dependencies have feature flags for async-std and smol? I’m honestly not sure what cargo vendor
does for feature flags not routed to the top-level package.)
I think you can see how that would cause the number of lines tokei sees to shoot into the stratosphere.
How could I ever audit all of that code?
That’s where projects like cargo-crev and cargo-vet come in to help decentralize the work. For example, if Google has audited an earlier version of one of your big dependencies, and you trust Google to do their job on packages they use, then you can just audit the changelog since that version.
https://lib.rs/crates/tokio/audit
https://lib.rs/crates/tower/audit
https://lib.rs/crates/axum/audit
…
I have no idea… Many call for adding more to the rust standard library much like Go, however this causes its own set of issues. Rust is positioned as a high performance, safe, and modular language meaning to compete with CPP and C. That means it targets things like embedded devices. Every new thing that gets added to the std library is one more thing for the rust team to have to manage and work on. Just Tokio itself has one of the most active Github and programming discords Ive seen.
…plus, Rust’s current approach was designed based on experience with Python and Java… especially Python, where the rich standard library is treated by the developer community as a graveyard of obsolete and/or flawed designs that you should prefer out-of-stdlib replacements for. (eg. Don’t use urllib
(or, before Python 3 merged them, urllib2
), use Requests and its internal never-to-be-in-stdlib urllib3
.)
What I will say that Rust needs to do better is making it clear which crates are de facto “Part of the stdlib which is packaged externally so it and the toolchain can be versioned independently”.
Reading key value pairs from a text file is not rocket surgery.
Honestly, that strikes me as exactly the attitude that has produced so many of the papercut bugs I’ve seen and some security exploits too.
argv
is just an abstraction fiction over the specific, not-universal quoting semantics that MSVC libc wraps around the underlying “the command-line is an un-parsed string” behaviour of the Win32 process spawning API.)QMimeData
in a bunch of compatibility hacks to work around other people’s non-compliant implementations of DnD/copy-paste of files and compile a big test corpus to verify them.)--long=option
instead of --long option
or only implement --long=option
or implement single-dash long args or assume that the first non-option argument implies --
or don’t implement --
or only accept -?
for help or don’t implement any kind of help or…)It’s the same attitude that makes me want to chain webtech app devs to a school desk and force them to read the entirety of the HIGs for all the platforms they intend to target before they’re allowed to omit or reinvent things like drag-and-drop or lists which should have multi-select. (Spoiler: The Windows 98/2000/ME HIG is a 594-page dead-tree book or MSDN Library CHM file, both of which I own, and the Windows XP and Vista/7 HIGs are HTML or PDF-format addendums to it… don’t get me started on how Windows 8 and beyond have been decaying.)
…and then force them to write lines drag-and-drop stuff in the browser a few hundred times until they crack and rip out their implementation based on native HTML5 drag-and-drop (i.e. native OS-level inter-window drag-and-drop) and do as Qt or GTK did and implement their own intra-window DnD which isn’t prone to cancelling the operation you spent 10 or 20 seconds scrolling for if you forget to wait a second or two for the source and destination widget to negotiate first before releasing the button… on a Zen 4 CPU from 2023 with 64GiB of RAM in Firefox or Chrome.)
No, it’s not that simple. You’re just externalizing the costs!
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.
– G.K. Chesterton, The Thing (1929)
The way to deal with this is shared code reviews. See cargo-vet
Some people feel safer by vendoring their dependencies, but that’s typically a security theater, and at best a wasted duplicated effort. If you just “LGTM” and merge vendored code, you’re just making an expensive backup (that protects against crates-io downtime, but not against malicious or vulnerable code. crates-io backup is better done via caching HTTP proxy). If you’re actually reviewing the code you’re vendoring that’s great, but we don’t need people re-reviewing the same deps over and over again. Any single project can’t realistically review millions lines of code, but the community as a whole can.
Personally vendoring was to not have to call out to crates.io every time and gives me a peek into my dependencies (more for interest than an actual audit). Starting to use cargo-vet now but I found it to not have audits for many of the crates I use even when importing Google and Mozilla’s lists. Maybe there’s a better tool somwhere? It would be nice if cargo included some sort of health metric.
Personally vendoring was to not have to call out to crates.io every time
What do you mean? You only call out to crates.io when you update a dependency, and vendoring doesn’t save you from that
As someone who works on a codebase that vendors dependencies, I think your comment is a bit too dismissive of vendoring as security theater. I think there is value in having the actual code of the dependencies in your version control so that you can investigate what state things were in by browsing monorepo history instead of having the indirection of having to download dependencies based on just a lock file in version control history.
But, indeed, if you vendor stuff, you shouldn’t just merge stuff without looking at the diff.
For added concreteness regarding cargo-vet: The most practical way of addressing the OP’s concern is using cargo-vet with at least the five imports seen at the top of https://github.com/mozilla-firefox/firefox/blob/main/supply-chain/config.toml . In principle/theory, it may feel deeply unsatisfactory to concede that people who are committers for the 5 orgs get to self-certify what they wrote, but in practice, you aren’t going to audit everything, so focusing you own audit efforts on what’s not already covered by these imports makes the remaining problem tractable.
browsing monorepo history instead of having the indirection of having to download dependencies based on just a lock file in version control history.
This seems like merely a UI/convenience issue, not a security aspect? The lock file has checksums, the checksums are verified, so you’re able to get the same code with the same consistency guarantees in both cases (apart from edge cases like crates.io disappearing completely without anyone having a backup, but I don’t think you have that in mind).
It is merely that, but that’s quite a load-bearing “merely” when git repo browsing tooling exist but integrating the display of crates.io crate content would require developing more tooling features.
I’m already using such tooling (cargo crev open $name
), so to me the git solution is completely inferior — worse UI (gitlab web at work, instead of my local editor), worse performance (CI clones make vendoring cost exponential over time), and worse security (doesn’t prevent code execution in checked out copy).
I’ll add one caveat to the line counting strategy that is: Rust has more lines that don’t express anything really useful than most languages. This is mostly because of the syntax but also the common formatting style.
In an impl Trait
with functions that has lots of trait bounds you’ll likely get 4-5 lines before the body of the first function even starts, simply due to the where
being indented and the heavy use of generics.
Likewise because of the granularity of dependencies most files will start with 10-15 use ...
lines. Yet another compounding issue is the zero-abstraction approach to iterators with method chaining where each method is put into its own line. Furthermore, cfg!
and similar macros for OS-specific code add even more lines.
However sheer number of lines does make code harder to deal with so it’s not altogether a bad strategy to get an overview of complexity. Syntax matters more than people give it credit.
You make good point rust is definitely verbose at times especially when expanding out the macros. However the sheet numbers make me shiver. I’d count crate numbers but this is also flawed since packages like Tokio are split into 20-30 packages, all by the tokio team just split up. Makes it hard to track transitive dependencies.
I think your conception of tokio
is pretty outdated, the code that makes up tokio
the crate hasn’t been split up into many crates for years now (since 2019, actually), look at the repo yourself: https://github.com/tokio-rs/tokio
it used to be split out because of compilation benefits but made managing the codebase a lot more painful and people often complained about needing to add more crates for functionality, etc.
Hm I’ll need to revisit this thanks for pointing that out! I wonder if it hurt their compile time at all.
I’m pretty firmly in the “stdlib should be batteries included” camp. I like writing rust a lot more than writing Go, but generally end up using Go for my own projects because I know I can get 90% of the way there without ever worrying about choosing from a mishmash of 3rd party dependencies.
Obviously this is very specific to my usage - I’m generally writing small, lightweight web backends, where Go’s stdlib really shines - and it’s not really a core usecase for rust. But still, it would be nice to have the option to pull in a chunkier std somehow.
On the other hand, Python offers an opposite example. It used to be the flagship of “batteries included” languages, but being bundled with the language made it very hard to evolve. Every library is pretty much stuck in it’s 2010 state. So you have the worst of both worlds: You need to depend on lots of third party libraries because they are far superior to the standard offerings but lots of things depend on compatibility with the standard version anyway so the ecosystem can never really move on (prime example: datetimes)
One good thing about Pythons stdlib is that third party libs can wrap the standard lib to iterate on API design.
In a way the standard library can just provide very low level functionality that third party libs can rely on. And sometimes the low level functionality (things like basic json parsing) are good enough… and for people who need “better” implementations can special case on their end
The Rust world means that just for parsing a configuration file I now need to make a decision around what to use, look at a bunch of comparisons between libs, when at the end of the day anything could work and I should really take the “lightest” solution
I’m not buying that argument even after all those years. I’ve build dozens of small things in python and I have usually used request as an external dependency and nothing else. And only if I needed some advanced HTTP stuff. It very much depends on what you are building, and sometimes you need a better argv parser, but not that often.
Well, let’s look at requests
then.
It depends on five projects:
All but charset_normalizer have alternatives in the standard library which requests does not use because they’re bad. Two are even actively maintained forks of standard library modules. Everything but pytest would be expected functionality for a modern batteries included language. Perhaps requests, even.
So here we have the arguably #1 most popular python library and it avoids using the standard library at every place. The stuff in there is kind of usable, in the “we have food at home” sense, that is true. But practice shows that in python, everyone just ditches the standard library as soon as they can. That is not the mark of a successful standard library to me.
aiui the big reason to keep things out of std is so APIs and implementations can evolve without being tied to Rust versions/editions.
That and no one can really agree which batteries to include.
It depends. The more batteries, the greater the burden on the language designers. Go is a language designed specifically for building web services, so it makes sense for it to include those particular batteries. If language is funded by a FAANG and the batteries are aligned with the business model, it’s easier to justify the costs of building and maintaining them.
On the other hand, as a developer, I know what you mean. I spent most of my career in JavaScript. About midway through found myself in a project where I needed features and bug fixes in a .NET service my Vue code was relying on. I could either learn C# and write a bunch of it myself or wait for someone to do it for me. I was never a Microsoft fan and even less a fan of inheritance-crazy OOP languages, but I decided to suck it up and become a full stack .NET dev for that project. It was refreshingly easy. It helped that the lead .NET dev kept things clean, but just as important was the fact that there was almost always one idiomatic way to build something and often times it was already a feature built into the framework. If you took the time to read the docs, you could build a lot of features in a short time and usually leave code other developers could easily read and understand. It was only after this experience that I realized that communities with fragmented ecosystems tend to suffer from a lot more confusion, reinvented wheels, and churn. Not everything Microsoft makes is brilliant, but there are definitely days when I’m building something fairly conventional in JavaScript and there are still a thousand ways to do it. I generally like my own code because I take a lot of care to critically read my commits before I push them, but I’m still never totally sure that the choices I’m making will be as clear to other developers who come in behind me as they were to me.
Not sure whether this overlaps wth Rust’s core use case (I’d be curious to hear what that is for you @strongoose), but just an anecdote that we’ve included as many batteries as possible in Raku. It’s such a pleasure to be able to do so many things without needing any dependencies. I guess it’s pretty common to have a robust standard library in “scripting” languages, though some take the Perl approach and pull in frequently used/deemed important third-party modules
This is why I’m a bit confused by so many people claiming here that “it’s the same in all languages” when it is easily verifiable that thsi is not, in fact, remotely true. Zig is a good contrast here too, demonstrating that it’s not just dynamic “scripting” languages that disprove the claim.
Yeah, the most common “core use” that I see cited as a reason for the thin stdlib is embedded systems - but I don’t know very much about that kind of development so I’m not well placed to comment on it tbh.
Now count the number of lines of code your average non-trivial C program has inserted into its address space when the dynamic linker gets involved.
I think a much more interesting metric would be ‘LoRC’ (Lines of Reachable Code). Strip the dependency tree at the function level, aggressively remove dead code, and now tell us how many lines you’re pulling in. That’s a more useful number to work with for the sake of security, performance, binary size, etc.
An exploitable bug in your program could very well make previously unreachable code now reachable.
No. Unlike shared libraries imported into C programs by the dynamic linker, Rust will not include any dead code in the compiled artifact. Besides, Rust has much better protections against those kinds of bugs than e.g. C.
After Rust does dead code removal, there is a resulting minimal binary. Working in reverse, would it be possible for the Rust compiler to do a transitive closure over the source code that is necessary to generate the minimum binary? If so, that subset of source code could be output and thus evaluated more easily.
In theory yes, but in practice the mapping from source code to object code is too complex.
The closest real-world equivalents are debug symbols (such as DWARF) and code coverage instrumentation. I think Rust currently emits coverage data at the line level, but if it had Haskell-style per-expression coverage data then you could work backwards to find all the reachable expressions, and then do some sort of parse-to-AST-and-diff operation to emit the subset of source code that ended up in the final binary.
The challenge would be implementing such a design without accidentally building a full-blown decompilation framework ala Ghidra.
Interesting idea! From what I can read though, dead code elimination is done primarily by LLVM. You would also need a way to go from whatever IR you’re in back to the frontend language, which I’m not sure is generally possible.
At this point this is a cultural/generational divide.
Some people were raised reimplementing lists and strings from scratch in every new program and are terrified of a library that does anything they deem trivial.
Some people were raised with a library ecosystem that had the kitchen sink and don’t see a reason to implement something twice.
As someone who pretty much straddled that timeline I came to abhor the implement data structures every time experience and have mostly joined the other side.
Rust compiles slowly, so the ecosystem encourages small crates as a way to reduce build times.
That may be, but 3.6M lines of code is still 3.6M lines regardless of whether it’s in one or 3,000 packages.
Given what I remember about cargo vendor
, I’m fairly certain that 3.6M lines includes every possible dependency for every possible build configuration on every possible target platform. (i.e. I believe cargo vendor
downloads everything that Cargo.lock
pins.)
Hell, it wouldn’t surprise me if you told me that most of those 3.6M lines were inside the machine-generated windows-sys
crate that Tokio depends on if building for a Windows target. The size of “the Windows API”‘s surface makes Qt’s stable of bundled functionality look quaint.
(Because getting people to depend on built-in functionality instead of portable libraries was their original vendor lock-in strategy, so, before they conceded the API war, they need to have a lot of built-in functionality and the camp used to doing that still exists within Microsoft.)
The IDL files windows-sys
was generated from total 30.87MB in size! Even assuming they’re using something as verbose as XML, that’s still just interface definitions!
Hell, it wouldn’t surprise me if you told me that most of those 3.6M lines were inside the machine-generated windows-sys crate that Tokio depends on if building for a Windows target. The size of “the Windows API”‘s surface makes Qt’s stable of bundled functionality look quaint.
I was wondering how big of a difference it would actually be and throwing together a quick project with the dependencies they listed and comparing the regular vendor result with cargo vendor-filterer --platform=x86_64-unknown-linux-gnu
results in 1,746,560 fewer lines of Rust as reported by tokei (3,671,590 → 1,925,030) which seems like a pretty decent reduction. Filtering to only normal dependencies brings it down another 96,676 (1,925,030 → 1,828,354)
The remaining top 20 crates by lines of Rust as reported by tokei are:
linux-raw-sys 367306
encoding_rs 134587
libc 126917
tokio 85543
syn 58915
rustix 56410
regex-syntax 53901
rustls 40387
regex-automata 40064
openssl 30132
ring 25974
portable-atomic 24512
hyper-0.14.32 23877
rayon 23797
unicode-width 22489
h2 22427
h2-0.3.26 21943
reqwest 21776
tracing-subscriber 20243
serde_json 20159
With linux-raw-sys
making up a full 20% of the remaining lines of code. After that there is a pretty long tail of dependencies and a lot of the big ones could probably be removed by disabling features one doesn’t need. E.g. encoding_rs
which is pulled in by reqwest
default feature chardet
for supporting browser-like encoding detection and decoding which they quite likely don’t need for their use-case is 7% (134,587) of what remains.
Looking into the dependency tree in a bit more detail it seems quite a lot of dependencies are pulled in by their choice of unzipping tool because it doesn’t split the library and binary up and so you end up pulling in things like clap
, env_logger
, and indicatif
. Replacing it with just a dependency on the zip
crate itself which is what ripunzip
is built on top of gets rid of 895,803 lines. That’s almost half of the remaining lines of Rust!
Its in every language.
Write more code yourself is the only solution.
Except even if you replace 1000 dependency LoC with 10 written LoC…
Thats still +10 LoC that you have to maintain.
Packaged code can break but so can your own code.
Yeah that’s what most have been saying, at least with my own LOC I wrote it so it’s easier for me to debug and reason about.
You are looking for Sane dependencies. However there is no silver bullet. It is about finding a balance between NIH syndrome and dependency hell.
What helps is the modular design: 1) of you dependencies, that allows you to pick only small pieces you want to depend on instead of bulky packages; 2) of your own code, that allows your users to pick only needed parts of your software and thus only related third-party dependencies.
And it is bit more complex, because it is not only about LoC and numbers of dependencies but also about their quality. Depending on a package from a random unknown author is more risky than depending on a package from a well-known company or organization (a package used by many others where are chances that somebody else did the audit or somebody else will fix the bug when found). On the other hand, the library from a well-known author can be a bulky package that does much more than you need and carries with it a historical burden. While the library from unknown author may do just the one thing you need and have so little code that you can do the audit yourself and fix bugs yourself if necessary. This is quite a multidimensional problem.
Great link, I’ve never seen this one before! I’m hoping I get better at picking out dependencies over time and that the larger companies have an established ecosystem of crates they use so I feel a little less hesitant to use them (like Tokio has become)
In general I considered the project to be trivial, a webserver that handles requests, unzips files, and has logs
People have very different ideas of what trivial means. What are some existing points of comparison? What are some languages/libraries that demonstrate a smaller (or even minimal) set of dependencies to solve the author’s need? How do they achieve this?
The main thing that concerns me about the rust ecosystem is centralisation.
Let’s say the Rust Software foundation suddenly decided that they did not want an open source developer participating in their community, for non-technical reasons. Do they have the power to make crates authored by them unavailable to the wider rust community, by dint of them controlling crates.io?
cargo supports alternate registries as well as git dependencies, not being on crates.io is an inconvenience not a damnation.
I don’t see what the problem is.
Nobody’s forcing you to use any of those crates, and if you want to “audit” them, you can. (What does that even mean, btw?) It will be a big task, but there’s just no getting around that nowadays. Your Rust application is running on an operating system with millions of lines of code, so to fully trust the system you’ll be reading through all of that at a minimum.
Using popular and published libraries means there are other people working with them and finding problems. A lot of people are looking at tokio, nobody is looking at your home grown alternative.
There’s no getting around the fact that you have to research your dependencies and decide which ones to trust.
To chime in: I’d rather depend on a well tested, well used comprehensive library that does more than I need if it covers all the edge cases. There are numerous situations where the edge cases seem rare — until you do fuzz testing.
There’s a relevant RFC just opened today, trying to address the issue of so much “table-stakes functionality” being sourced from different places.
I do feel like the Rust crate ecosystem is getting a little leftpaddy.
Do you have an example you would like to share? I keep hearing this sentiment but I rarely come across any crate that is heavily depended on and also very trivial, like the original left-pad was.
https://crates.io/crates/cfg-if is a classic of the genre.
This is maintained by the rust team, used in the rust compiler and standard library… it’s code your already depending on by the time you’re compiling rust at all. This doesn’t seem to fall into the genre at all.
The issue of left-pad
isn’t about who maintains it or who uses it, it’s a matter of whether very small dependencies are encouraged or discouraged.
NPM is an example of a dependency culture that encourages very small packages, with left-pad
being the cited example because (1) it’s only ~30 lines of code, and (2) it’s famous for that time its removal from NPM caused mass build failures due to how widespread it is in transitive dependency graphs.
Conversely, programmers in C/C++ or Go do not typically use very small dependencies – either they depend on a few large dependencies that each contain lots of functionality, or they just write the code themselves. So you end up with libraries like GLib that are absolutely huge – GLib contains an event loop, a test framework, an XML parser, and more.
The crates.io dependency culture is caught in the middle between people used to working in JavaScript (NPM), who write small libraries like cfg-if
or …, well, like left-pad
… and people used to working in C/C++, who write large all-in-one libraries like rustix
or winapi
.
I agree that cfg-if is an easily avoidable dependency, but I also doubt a 70 line macro is something the average Rust developer can bang out on their own. I sure can’t.
First, a 70-line macro should be well within the capabilities of every Rust developer. It’s not even a proc-macro, it’s just plain old macro_rules!()
with some trivial adjustment of the if-else
expression tree.
Second, the syntactic sugar it enables is IMO so minor that it doesn’t justify adding a dependency in the first place. If cfg_if!()
were part of the Rust standard library then that would be one thing, but pulling in a third-party dependency just for slightly terser #[cfg(...)]
attributes is very reminiscent of left-pad
.
For starters, I don’t think anyone should be using version 0 of crates they find interesting. Here be dragons.
That doesn’t work in practice. There are crates that got to a very mature state with 0.x numbers and releasing 1.0 would be unnecessarily distruptive to the ecosysytem when there is no pressing need for an actual breaking change.
That vendoring feature sounds nice, I’d love to do the same count with my projects – is there a way to do this with stack/cabal in Haskell? (Surely there’s a way with nix?)
Seems like there was some movement about vendoring in stack here (https://github.com/commercialhaskell/stack/issues/3813). Definitely one of my favorite features of cargo. If any rust foundation people are here, it would be amazing if cargo could filter dependencies based on the platform you currently are on (i.e. only linux).
I understand the concern regarding dotenv vs dotenvy. Tokio is a pretty mature library though. Is the author concerned that libc is a dependency?
More that rust has alot of fan out dependencies and the sheer magnitude of some of them. Imo Tokio is basically part of rust for the sake of what I do (mostly servers).
This is such a trade-off in any language. I don’t use rust but even in C++ any project of sufficient size will probably pull in some non-system source repos. And the situation for javascript and python is famously chaotic, with outright malicious packages masquerading as well-known packages with similar names.
The only real solution is to write the code yourself but that is more work both initially and for maintenance, let alone that I would find it very difficult to replace things like libcurl or libssl.
I don’t particularly care for Go’s modules, I like Ruby’s Bundler packaging but I’ll take either over Rust and JS’s “million little packages” approach to dependencies. Dep problems seem to grow like O(n^2) (quadratically?) so apps with more, smaller dependencies quickly grow unmanageable.
It seems every popular language has to go through the same stage of maturity. It’s too late to cling to the existing ecosystem but a bit too early to have a battle tested own one. Just give Rust some time to gain enough trust. In a blink you’re gonna see articles like “Why X has so many LoCs to maintain, why not just use Rust?”.
Just to name a few that I remember:
And so on and on. Until we invent The Language (C++ tried to be the one) that everyone agrees on using for everything there’s gonna be change and with change come uncertainty, be it speed, safety or adoption. Just keep managing the risks with proper tooling and processes and keep going!