What’s the problem with pipe-curl-into-sh?
11 points by ecco
11 points by ecco
You’ve seen it : many popular tools will have a one-liner homepage with something along the lines of
curl https://fancy.tool/install.sh | /bin/sh
And inevitably people will comment on how unsafe this is.
I don’t get it. How is it any more unsafe than cloning a repo and building and running its code?
There is no realistic scenario where this matters, but there was a blog post describing a clever hack based on this.
The cleverness of it makes peoples' imaginations run wild, and harms their sense of security, regardless whether that impacts real-world security or not. Any explanation of why the cleverness doesn't matter only nerdsnipes people into inventing increasingly contrived scenarios where a brilliant attacker could actually pull it off, and then curl | sh
ends up being associated with the wildest attacks anyone could imagine.
The hack is that a script may make bash read it slowly, causing backpressure in curl, which the server can detect and serve a different rest of the script than if it was downloaded without being executed. It feels "undetectable" which makes it extra scary. People don't read the source of things they're installing, but they like to think they would. The loss aversion makes losing the mere possibility feel even worse.
The problem with this, apart from -x
and VMs existing, is that there's no threat model where this is relevant. Trustworthy sources won't hack you this way, and untrustworthy ones will tell you to use PPA instead, knowing that you'll give it root without even blinking.
Of course you have to trust your source, but this also completely removes even the possibility of comparing the checksum of what you just downloaded with a given signature.
And yes, I often do that - but maybe mostly for ISOs.
Also just actually having the code there to look at in case it fails to build. So I'm kinda likely to trust e.g. Rust to not mess up the installer but maybe not random project with 3 github stars (it's a bad metric I have criticized in the past, but I really mean "one person's pet project with 30 commits in 2 months")
It is very easy to screw up an installer script like this if you don't know what you're doing. Take this script (shebangs omitted for brevity):
echo 'Installing foobar...'
echo "Pretend there's some work done here, call it step 1"
echo "Pretend there's some more/different work done here - step 2"
echo 'Foobar installed.'
Even this simple script is wrong, because the network connection can be aborted before the byte range that has the step 2 command gets to your computer. Now you've run step 1, but not step 2. But we can do even worse, so let's get more specific:
echo 'Cleaning up old versions of foobar...'
rm -rf ~/.local/lib/foobar
echo 'Installing fresh copy of foobar...'
# curl | tar xC ~/.local/lib/ here or something
echo 'Foobar installed'
If the network connection here aborts at the exact right time, whoops! You've run rm -rf ~
. You run a full system backup before every time you install a new operating system, repartition your drive... or use curl | sh
, right?
The correct way to do this is:
main() {
echo 'Cleaning up old versions of foobar...'
# Etc.
}
main
Big projects presumably (maybe!) will get this right. But not everyone will. I have corrected this bug in the installer script of at least one not-obscure project (can't remember which one, or exactly how not-obscure).
Ok, that is actually very interesting. But I'm not sure that it's what I've heard many people complain about.
What you're pointing to is that curl-to-sh is fragile and likely to be buggy. You're right and are making a good point.
What I've seen people complain about is that it's a massive security hole. That's what I don't get.
Do Unix pipes not have a way to propagate an error condition? I naively guessed that this would result in an error and sh would be smart enough to not run incomplete input.
There’s EPIPE, but the problem is that the shell doesn’t read the whole script before running it, it runs commands as they appear on its input. There are tricky requirements about how a shell must read scripts: see under “stdin” in the POSIX sh man page
Ideally the rm -rf ~ bug shouldn’t happen as described because I don’t think the shell will run a command when it gets a read error in the middle. But I wouldn’t bet on it because curl might not fail in a way that causes EPIPE and the shell might run a command that doesn’t end in a newline.
(Caveat: I'm not an expert, this is all based on skimming the manpages just now.)
EPIPE
is delivered to write(2)
callers (if they ignore SIGPIPE
), not read(2)
ers. AFAICT there isn't any way that errors are propagated forward in a pipe, only backward - all pipe(7)
says is that if there's nothing on the writer end, the reader end gets EOF, and read(2)
's errors don't list anything that looks relevant to me.
Note also that under glibc, pipe FDs are "fully buffered" (i.e. in arbitrary blocks) by default.
D’oh, you are right about EPIPE of course! That makes me even more wary of piping possibly-truncated input to sh!
Note that shells generally don’t use stdio, they bypass any buffering in libc. If you strace bash when it is reading commands from a pipe, you’ll see a large quantity of read(0, …, 1) syscalls, so that, as POSIX requires, it doesn’t over-read. It implements this by testing if stdin is seekable, and if not sets its buffer size to 1 (code). There’s similar can_seek() logic in OpenBSD ksh.
To my surprise, the Almquist shell (FreeBSD sh, debian dash) ignores this subtlety and always does buffered reads.
Thanks! Yes, by “incomplete input” I meant an incomplete line.
I’d be curious to hear if there is in fact a situation where curl downloads only part of a resource without raising some sort of error. I could easily be wrong, but that sounds like the type of glaring bug I wouldn’t expected to see in curl.
I'd be surprised if curl
wouldn't exit nonzero in any situation where it downloaded only part of the resource. But this doesn't help you unless you've done set -e pipefail
, which you certainly haven't done in your interactive shell. Even if you had, it's racy.
One advantage of building from source is that you at least know what code is being used to generate the binaries (even if it might contain obfuscated malware).
One advantage of using a package manager is that you usually have some assurance you can fully uninstall the tool. That’s only indirectly a security advantage, though.
I’m curious to hear other answers because I agree that if you trust the code and hash shown on a web page for the tool, that’s not categorically different from trusting its install script, assuming you’re running the tool in the same context (user, root, etc.) as the install script.
Edit: I suppose curl … |bash
removes any “verify the hash of the source” step, so it is worse than Git, which uses a Merkle Tree to ensure you’re getting the same source/script as everyone else—the web site could be unknowingly compromised in the curling case! So the hash seems like an important difference.
I agree with that package manager are superior : you trusted the author you got your OS from, but afterwards adding any extra software from the package manager does require trusting another third party.
But I'm comparing curl-to-bash to either binary or source distribution of third-party code, for example from a random GitHub repository with potentially a releases page, as nobody seems to complain about that.
You could argue that you can easily also look at the source code of the install script by just running the curl
part, minus the pipe. Building from a source really doesn't mean that you've read the entire source (I would even argue that a huge majority of people don't).
If the website is compromised, then… too bad? It could also point you to a compromised binary. Yet I've never seen anyone complain that, for instance, blender.org is offering binary downloads. Don't get me wrong, I personally have no issue w/ Blender whatsoever. My point is that curl-to-bash is just identical to a binary download : it's convenient, to the cost of a small security tradeoff.
The point I eventually arrived at in my stream-of-consciousness comment is that a Git commit (or similar hash) is easy to verify (automatically done in Git’s case) between different users and over time. Curling to Bash doesn’t always support that, so you might Curl into Bash safely one day and then the site is compromised and you get malware the second time.
One concern I’ve heard is targeted attacks. If you’re installing from a package repository (ubuntu, pypi, npm), then you have a reasonable expectation that you’re getting the same payload as everyone else who downloads it. The repository is essentially a third party trusted to send the same packages to everyone. Sometimes (e.g., ubuntu) this is technically enforced by hashes and signatures in the package manager protocol. (Though post-install scripts are a concern, e.g., an npm install can reach out the internet and curl | sh
on you.)
If you’re doing curl https://fancy.tool/install.sh | sh
a malicious server might fingerprint you—perhaps identifying you or your employer by IP address—and send you a malicious script while sending everyone else a benign one. The same concern applies with downloading any script or executable from the internet though.
curl -f https://fancy.tool/install.sh > install.sh && sh install.sh
bypasses gatekeeper/notarization on the mac compared to downloading a native binary. If the payload is malicious, prevention and incident handling could be somewhat easier if you download first compared to curl | sh
by virtue of the script existing at some point as a file. In an enterprise environment, anti-malware tools will grab the hash of the file and potentially block execution if it’s known to be malicious. In other environments, if the script isn’t malicious enough to delete or overwrite itself, you at least have something to reverse-engineer to figure out what it did.
You are absolutely right. Copying that into your terminal is the same as checking out a repo, simply because it can do exactly the same thing even the "curl hack" could be part of the configure, install, etc. scripts in your package.
To be fair even with package mangers you have to trust that the author of the package is trustable as well as the author of the code, because there is no in-depth security analysis happening. Sure in best case the maintainer takes a close look, but we live in a world where Heartbleed, etc. got into everything and that one was pretty obvious looking at the code.
There are a lot of security myths, that largely don't apply when you look at what is happening in the real world. Examples include that somehow your firewall magically adds security (sometimes it does more, also it might help accidents - or lead to them) when ti usually just allows access to the things you run, as if it wasn't there, or how virus scanners won't protect you from many many attacks, and how people worry about exposing their SSH port, and come up with port knocking, or VPN style stuff, when in most cases SSH does the same thing your VPN does (using certificates for auth, securing connections), etc.
Yes, for all those things there is scenarios where it may help you, but for all those things there are scenarios where the protection is the thing that's being exploited, on top of that one increases complexity and something I see so frequently both for setups and for code is that first someone adds very specific security measures, that are smart and simple, but then something change (infrastructure or code) and there is the memory of "Ah no, this will prevent that security issue" only to get hacked and find out that the original assumptions don't apply to the changed code or infrastructure anymore and therefor the security was imagined.
It's basically a more convoluted version of "I have a firewall/virus scanner, so I cannot possibility be hacked" mindset. And these wrong, but - often under great effort - created wrong senses of security are what gets people.
"I will be checking out the repo, instead of piping curl, so I am safe" might just be one of these wrong senses of security. That goes a bit into psychology, but there is that is a bit like building a castle and then making sure your gate is super protected, and everyone spends so much time and effort. They add this and that, and then they create super complex ways of getting in, and you have to know this and that, to even make it in and it's great, and super secure, and everyone is really proud of how secure they made entering the gate, but nobody took the time to notice or fix that gaping hole at the back of the castle, far away from the gate.
And to move that to IT and the port knocking for example. Sometimes there is that divide of admins/devops/SREs making super secure complex infrastructure, but the web app, API, server, database, etc. has gaping holes that are super obvious if you give them a glance from a security perspective, and then one ends up with situations where "not even root could get that key", but the app has the key, to talk to the DB and while you cannot escape the DB or something like that, the attacker doesn't care about that at all, when they got all your data.
Piping curl feels the same way. You are smart and don't just execute that, but you don't read what your npm run dev
or ./setup.py
actually does. For all you know if could just run curl ... | /bin/sh
And in some cases you get to the next layer, where you say "it's fine, it runs in a completely separate container", but then it connects to your main DB gaining access to everything.
Ideally, don't run code unless you trust where you got it from or have read it. Since that basically implies you don't go to any website with Javascript, I'll add that you should only run untrusted code if you understand the blast radius, whether that's a browser tab or a container or a virtual machine or whatever is appropriate.
One way curling into sh
is less safe than running any other untrusted code on your CLI is you don't get to see what you're running. If you download the shell script first you can look at it before running it. If you just execute what you download with curl the host could serve you something other than what you get from downloading with a browser.
Personally, I think using the curl approach for a source you trust, like download.docker.com or astral.sh or whatever, is just fine.
As others have mentioned you don’t get to see what you’re about to run before doing it, and even if you check the script and then do curl… you can detect the back pressure and serve something different.
This sort of thing, and subversion of language package managers to run arbitrary code on developers' machines are two of the biggest weaknesses in how we build software today.