The Comforting Lie Of SHA Pinning

39 points by rrampage

dzwdz

There’s a widely held belief that pinning a GitHub Action to a commit SHA gives you immutability, its what Microsoft/GitHub are recommending, and its what Aqua are recommending. After all, a SHA is content-addressed. It cannot be moved. It cannot be re-tagged. It is, in theory, the most stable reference you can use.

Why is that written as if that wasn't true. It literally is immutable.

I thought that article was going to be about some weird-ass bug where you can fake the commit hash, and swap out a version that someone has pinned for a malicious one. That doesn't seem to be the case here, and pinning works as intended. (I suppose the AI hero image should've been a tell...)

I believe the industry advice is a bit of an overcorrection, and we’ve replaced one weak guarantee (mutable tags but scoped to repo) with another vastly worse idea in unscoped SHAs. Yes you should check, yes you should validate it, but tags are human readable, SHAs are not and if you ask yourself “Do I always properly check?” do you? because I can’t say I do enough validation 100% of the time.

So what? With how things are, you have to choose which of the following you are safe from:

someone with commit access to your dependency changing the tag to point to a malicious commit,
or someone with commit access to your repo merging in a version bump pull request from someone malicious.

Why would you ever decide that the latter is more important? You can prevent the latter in other ways, because it relies on you making a mistake. You can't prevent the former. Sure, I get why you would want to safeguard yourself against that mistake, but why would you ever do this at the expense of not pinning hashes? This is an absurd trade-off to make.

This is lowkey the sort of article Jia Tan would write if he got commit access to a widely used action.

cyberia

I wonder what happens if you make a tag named with a 20-byte hex string. Since the repo@ref syntax is overloaded, could you use that to make a ref that looks like a commit but is actually a tag pointing somewhere else?
- yossarian
  
  You can’t, GitHub won’t let you push a named ref that overlaps with the SHA namespace. What you can do however is push overlapping names refs in the branch and tag namespaces.
  
  (This is a restriction on GitHub’s side, not a restriction of Git. Git will happily let you produce SHA-shaped named refs.)
maurycy

The commit hash from a fork trick is a classic GitHub prank. Git doesn't verify committer identities, so: fork a well know project, make a commit using the owners email, and then change the repo in the commit's URL to that of the original. Now you have torvalds saying he deleted Linux.

Eventually, they added the "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository." message to make it less confusing.
yossarian

The industry term for these is “impostor commits.” GitHub should really forbid their resolution in contexts like GitHub Actions; it’s within their technical ability to do so.

Still, pinning is a good idea; you don’t get full immutability (for other reasons), but you do get immutability on the action’s own contents. And to prevent impostor commits, you can use a tool like zizmor1 or pinact2.

(More generally though, I think you should almost never take opaque identifier updates from third parties! Leave it to tools like Dependabot and Renovate to update your hashes, the same way you trust cargo to update its lockfile.)

Edit: on re-reading, I think the author may have missed the salient dimensions of the Trivy hack: the impostor commit wasn’t itself the exploit, it was just an opportunistic way to hide the payload. GitHub provides lots of ways to do this, including overlapping branch/tag refs. The more relevant thing is what the attacker did afterwards, which was to modify a bunch of mutable tags so that they pointed to malicious contents. This tidily demonstrates the importance of hash pinning on actions.
- runxiyu
  
  GitHub should really forbid their resolution in contexts like GitHub Actions; it’s within their technical ability to do so.
  
  Could you elaborate? It's definitely possible and already done to scope tags with ref namespaces, but I don't see a trivial way to scope object stores to each repo without exploding the object store size.
  
  Doing a reachability query on each access is extremely expensive.
  - yossarian
    
    I meant that they could change their search space: getting the ref-to-hash mapping is very cheap for remotes so GitHub could require that every actions hash is on the head of a ref for the actual remote specified.
    
    This would be a breaking change, though.
    
    (Separately, GitHub does have some relatively cheap way to compute this reachability, since they show it as a callout on every rendered page that uses an impostor commit, without latency beyond their normal page response times.)
    
    dcreager
    
    GitHub could require that every actions hash is on the head of a ref for the actual remote specified
    
    On the head or just reachable from? On the head would work if every consumer was pinning to a tagged release. But you might want the weaker reachability to e.g. let folks pin to any commit on the main branch.
    
    yossarian
    
    Yeah, I was thinking just on the head. It would definitely fall apart for any commit reachable from a head!
    
    hailey
    
    This post misses the far more pressing problem with SHA pinning on GitHub Actions - there's no concept of a lock file and it can't pin transitive deps. You can pin an action by its SHA, but if that action pulls in another action by tag or branch name (eg. the rust-lang/calendar-generation action), it's game over.
    
    masklinn
    
    The sub-pin issue extends further, to binaries and scripts the action might be downloading and running.
    
    jonathannen
    
    None of these answer the real question: who decided this code should run?
    
    Pinning to SHAs doesn't make things "safe" -- it turns silent changes into explicit ones. That's useful for change management, but it's not a security boundary, especially given what this article demonstrates about fork-scoped resolution. Tags are worse for integrity (mutable, "free upgrades!"), but they're at least scoped to owner/repo. SHAs are not. Two different failure modes, and the industry overcorrected from one into the other.
    
    There's also the rug-pull problem: SHAs can disappear. Sometimes that's good (yanked for a vuln), sometimes it's leftpad.
    
    In practice, almost nobody meaningfully reviews a SHA-to-SHA diff in a workflow file -- it's effectively unreadable without tooling that resolves the upstream change. So the review process the whole model depends on doesn't actually happen.
    
    This isn't a crypto problem, it's a governance problem. Move upgrades into CI, make them explicit (Renovate/Dependabot with required approvals, action owner allowlists), and stop pretending the ref format is your security boundary.
    
    veqq
    
    I like repo games like having McCarthy [0] make some initial commits, inventing Lisp then... However, UNIX time blocks this requiring evasion like Archéo Lex though the main forges reject this code entirely. This made me stop. Now that I use a newer language, I suppose I could place Rich Hickey, bakpakin and such but RH likes placing copyrights and even "impersonating" for an empty "Invent Janet" seems a bit weird when you know the person.
    
    [0] What do you do pre-email? Or post-email when you can't find it? :P
    
    eblume
    
    Wow, this was surprising to me! Great read. My immediate reaction was that this attack scenario still requires merging a PR that should have triggered a review of whether or not that specific SHA pin is good or not. A reviewer should go check out that SHA, and this would flag the obvious attack right away. (Snape did WHAT?!) But the key insight here I think is that switching from tags to SHAs actually loses the scoping that restricts to just the specified owner/repository. That feels like a bug github needs to address.
    
    Broadly speaking, it's not enough to be precise, you need to also be accurate: SHA pinning works great and I do recommend it, but you need to make sure your SHA is correct.
    
    masklinn
    
    An obvious solution would be to support intersections: if you specify both GitHub resolves the tag/branch, then verifies that the tag is for the rev or the branch contains the rev. This gives you both scoping and immutability.
    
    ‘Course you could also do avoid merging randos’ updates to actions and only use automated bumpers.