Evolving Git for the next decade
51 points by fanf
51 points by fanf
Seems to be mostly a transcript of https://fosdem.org/2026/schedule/event/HTJK33-evolving_git_for_the_next_decade/
When Git was released, SHA-1 was considered to be a secure hash function
Linus considered SHA-1 to be secure; in reality, a better-than-brute-force collision attack had already been discovered and actual experts were urging a migration to SHA-2. But Linus dismissed those concerns with his usual bombast.
it has been asserted that the use of SHA-1 is not primarily for security and a number of arguments have been made to back that up
Then Git should have used CRC128 - it would have been a lot faster, no one would have built security features on top of it, and enterprises wouldn't be demanding its removal by 2030.
Linus considered SHA-1 to be secure
He basically adopted monotone's arguments.
Then Git should have used CRC128
So what's the collision resistance properties (for non-adversarial data) of some CRC128 (and which exactly to pick, since there's no defined polynomial for that)? SHA-1 had and has that covered.
Monotone’s arguments have turned out to be an interesting mixture of right and wrong over 20 years later.
They refer to a paper by Valerie Aurora, who I believe was working on ZFS at the time (not version control). I think that paper was right to be dubious about SHA-1 but some of the reasons are invalid, as the Monotone rationale points out.
But I think Monotone was too complacent about the strength of SHA-1. MD4 was thoroughly broken; MD5 was recently broken; it wasn’t a huge surprise to learn that SHA-1 was weaker than intended. Monotone underestimated the cost of migrating from SHA-1 to a successor: git has been working on it for years and making little progress because most git developers are put off by the huge effort required for little tangible reward.
One of the slapstick consequences of the SHA-1 collision was when it was committed to a svn repository as a test case; unlike git, svn uses the SHA-1 of the raw file so the collision broke its invariants and broke the repository. This is a counterexample to Monotone’s handwaving about practical measures to avoid problems due to collisions in normal use, and it would also have broken SHA-1-based content-addressed filesystem deduplication.
But, kind of unexpectedly, that complacency would have been fine if Monotone had chosen SHA-256, which has turned out to be much more solid than I think cryptographers would have predicted 20 years ago.
ZFS also has defence in depth. It uses the hash for deduplication but it can use it as a quick compare with fallback to a full compare. If the new block has a hash that doesn’t match any existing hash, it’s new. If it does, one mode deduplicates it immediately, the other does a byte-for-byte comparison. If you are worried about hash collisions you can use the latter mode.
SHA-1 benefits from a huge ecosystem of libraries and support. CRC-128 is practically unheard of for content addressing.
It’s easy to see why people get confused when cryptographic hashes are used in non-security contexts, the naming itself implies security properties that aren’t the primary goal.
Worth noting: Git was designed around the kernel’s email patch workflow. Remote repository support was added early (by v0.99 in July 2005) to match BitKeeper’s distributed model, which developers were already familiar with. The design explicitly allows everyone’s local history graph to look wildly different: supporting local rebases and history rewrites before sharing.
Anyway, sneaking malicious commits into history doesn’t work in practice. In an email workflow, you’re explicitly reviewing and accepting patches. In a remote workflow, you’d need to force-push to master; which is typically forbidden by conventions around Git (protected branches, CI checks, code review). This was true long before SHA1 was proven to have collisions.
sneaking malicious commits into history doesn’t work in practice. In an email workflow, you’re explicitly reviewing and accepting patches. In a remote workflow, you’d need to force-push to master
The point of a collision attack is that an attacker doesn’t need to force push: Jia Tan sneaks in an apparently-benign commit that allows them to replace it later on with malicious code downstream without breaking any signatures.
Better handling of large files is critical. Game developers have always needed this, and now LLM developers do as well. AI models can be massive files, and people understandably want to version them. Huggingface resorted to using a plugin for git (git-xet) but that means dealing with a separate thing to install/update, which sucks. The obvious and standard way to deal with this problem is content-defined chunking. That's how git-xet works, and also how my VCS works. If git adopted it as well, it'd be pretty damn great.
now that git has a unified ODB interface, its possible that git-xet could be once such interface implementor! when combined with LOPs, this would mean that the future of large files in git, is identical to regular files in git.
as far as i know, no forge has adopted LOPs yet (but i am looking into it for tangled.org at the moment!)
I am glad that they finally seem to get that the git UX is terrible. I look forward to future git being less weird and difficult.
Apparently, large file improvements will be based on content-defined chunking: https://gitlab.com/groups/gitlab-org/-/epics/20716
Git is old: the project cannot simply completely revamp its UI and break users' workflows
At this point the only thing I'm interested in git is interop facilities with jj until jj gets its own backend and becomes more mainstream.
"That moment when you realize that a tool simply fixes all the UI issues that you had and that you have been developing for the last 20 years was not exactly great."
I imagine that this is true for more FLOSS projects. With vibe coding I hope we get more people who do a throw at getting rid of all the bathwater (baby included maybe even) and making something better.