pre-commit hooks are fundamentally broken
51 points by calvin
51 points by calvin
To me this honestly just seems like misuse of the hooks? Why would you want the act of committing a diff to change the diff? In any case - formatting or otherwise?
The only way that I have ever seen pre-commit hooks is to simply prevent a commit - making a change to the commit and then committing something that was likely not read by the author, and definitely not the reviewer is not something I would ever have considered. I would consider all the other problems that post talks about stem from the decision to try to automate changes to the patch that was written and reviewed, not inherent to the pre-commit hook mechanism.
I get the desire to automate trivial tasks, but running a formatter prior to pushing is not hard and means the you and the reviewer have actually read what the real commit was, which seems vastly superior to potentially arbitrary changes.
For rust this isn't an obvious problem, but for other languages bugs in "reasonable" formatting changes could legitimately create bugs - reindentation of code in indentation contexts, adding or removing braces according to existing style rules, etc. In general there's not likely to be a bug in such tooling, but as the article demonstrated their initial version would have committed a massive unrelated change had the commit-hook just worked immediately.
I get that their are problems with pre-commit hooks, but labelling them fundamentally broken for what is, as far as I'm aware, not an intended use case does not seem reasonable.
It's not a matter of which tree, etc the hook/diff/whatever is running on, as I see it problem here is thatwhat they're trying to use the hooks for does not seem reasonable in the first place?
Maybe I'm wrong and other people do have such systems in place, but if I were a reviewer of code, and the pushed PR was different from what I reviewed I would not be particularly happy, unless they were specific changes I had preemptively approved.
i don’t control other people’s shitty hooks! i’m writing this post specifically so people will stop writing hooks that try to modify the index.
that said, i don’t agree that modifying the index is the only problem with pre-commit hooks, for the reasons in the post.
i’m confused by your statement about “review”. in your workflow, does review happen locally before a push? i’ve never heard of that before, how does that work?
It happens before you merge a pr into a repository, if your pre-commit hook makes changes when you push to main, or if you're using something like GitHub the promotes merging from PR, then minor changes in response to a final "this looks good if you make these minor changes" can accrue unrelated changes for that final push, and as your post discussed minor tooling errors can result in such tooling making significant unexpected and unrelated changes.
Hence my opinion that pre-commit hooks exist to say "there is something wrong" rather than trying to automatically fix those issues. Once the hooks are not trying to actually make any changes the question of indexes and such becomes moot.
[edit: minor addition, there is a git-clang-format script I wonder if similar exists or could be easily written by modifying the final invocation of clang-format to call rustfmt? Certainly I frequently run git clang-format HEAD^ or git clang-format origin/main depending on the current state. I always prefer HEAD^ but GitHub is so garbage and handling PR changes that merging main into a PR is the required workflow]
To me this honestly just seems like misuse of the hooks? Why would you want the act of committing a diff to change the diff? In any case - formatting or otherwise?
The main example in the article is about running rustfmt --check on the pre-commit hook, which will fail if the code is not properly formatted, but won't change the to-be-committed files.
The rest of your post follows from this premise, but i think the premise doesn't really apply to the article.
My reading of the article was that the premise was that precommit hooks that mutate the tree don't work, which to me is a feature rather than a bug. If the issue is rustfmt's style checking not working well with diffs then the title is incorrect in blaming the hooks rather than rustfmt.
If the issue is rustfmt's style checking not working well with diffs then the title is incorrect in blaming the hooks rather than rustfmt.
Hmm, no. I don't think that's the issue either. rustfmt doesn't work with diffs AFAIK: it formats the .rs files passed to it; or it checks the formatting is correct when used with --check (without changing the file contents), which is what the article does on pre-commit hooks.
The author only mentions pre-commit hooks modifying the commit contents on a tangential aside about the pre-commit framework near the end of the article:
It doesn't fix the issues about running during a rebase, nor does it prevent hooks from trying to add things to the current commit.
But it doesn't talk about git hooks modifying the committed files otherwise. The main examples use rustfmt, but that's incidental. The same argument could be made about any format-checking or linting tool on a pre-commit hook.
These have always driven me mad. I don’t like scripts being injected into my workflow & I can’t stand how much these hooks slow down or break a rebase-based workflow. I have routinely disabled the hooks and/or moved the to pre-push. CI is probably already testing formatting, but you can also just tell folks to set it up on their end (editor setup, write their own hook, whatever) rather than enforcing this stuff at the wrong level.
I think precommit hooks are reasonable, but they should be very fast, and only apply to the diff itself. More expensive checks should be part of whatever CI infrastructure.
Fast is relative. There are a few absolutely trivial assertion cases that can be checked, but it should never fail if there is a merge conflict. These formatter checks go beserk if trying to get your rebases in line—they also aren’t trivially fast. Even worse are those enforcing test/CI on commit. The thing is pre-push still catches the error while giving you the grace of fast rebases + fixing the history before it gets pushed; otherwise the history is littered with noisy commits like “ran formatter” or “fixed test” which makes blame+review harder just to make commits easier. This is the wrong friction point.
I haven’t noticed issues with pre commit checks taking too long with any project I’ve worked on. Similarly formatting/style takes negligible time, though as above I don’t consider reformatting to be the job of a pre commit hook, so I have never done so. I’d have to actually time it but to me commit signing feels like it takes longer than reformatting.
I’m also curious just what these pre-commit hooks are that take so long, how big are these patches? Pre-commit build/test is trivially outside of my “checks should be fast”, but maybe I should have been more explicit - if the check performance is not linear to the size of the diff or faster it is not reasonable to include in such a hook.
For the “history littered with random formatter” etc comments: why would you be pushing your PR to to your main branch with all the intermediate work in progress commits? Why would you be merging broken commits to main?
I’m obviously unfamiliar with your approach to these things, but I make plenty commits with those log messages without such hooks. But they don’t break blame or anything because they’re squashed prior to review (no one needs to review a history of PR development - obviously GitHub’s broken PR history means that after a review has started you have build up these dumb commits), and any changes after the original PR are squashed into a single commit.
The only times I’ve used multiple commits for a single PR is when the change was large but could not be broken into distinct PRs, and none of those commits are the irrelevant “fix formatting”, “fix build”, … commits.
That’s the fundamental difference—the WIP/fixup commits shouldn’t be even in a pull/merge request’s history. If you create these commits, at least with Git, use the interactive rebase to correct the issues. These precommit things tend to fire off in interactive rebase as well as git pull --rebase. Feedback should also be addressed by amending relevant commits. Ideally this commit stack should be able to merged off the stack or cherrypicked without issue (yes, the MS GitHub pull request UI is noisy, but it’s just a bad model to begin with—where things like Gerrit fix it). This requires that the commits are relevant & legibile on their own—making squash unneeded.
With pre-push, I can still have such checks, but I can amend the commits when I want to instead of being a blocker in the middle of my workflow. No one’s local workflow should be broken/slowed dowt by hooks. Pre-commit breaks/slows down workflows.
Running pre-commit hooks on the working tree instead of the index is a mistake. And that limits what tools you can (reasonably) run as part of a pre-commit hook. In general, it is straightforward to run tools that take input from stdin and don't need context other than the file to run. ej. its easier to run prettier against js files on pre-commit (and fail the commit if the check fails, not trying to 'fix it' by modifying the staging area behind the users back).
You can use git show :$file to write the staged version of the file to stdout and you can use git diff --staged to obtain a list of files that are staged for commit as the article mentions. ej. something like git diff -z --staged --name-only --diff-filter=ACM So writing a pre-commit hook that checks that the code to be commit is properly formatted is easy to do. Having it check the the code build, isn't.
That said, this is one of the reasons why stopped thinking the staging area was a good idea. If one is committing changes using git add -p, chances are one hasn't run the tests on the version that it is going to be committed. Ditching the staging area is another jujutsu does right. And jj fix to run code-formatters post commits easily.
What puzzles me is why someone would put these kinds of checks or fixups in a git hook instead of a test or maintenance script. If I’m saving work before switching branches, I don’t want to be interrupted by demands that I plan to deal with later. When I’m rebasing there are already plenty of things that can go wrong, I don’t need even more! Surely it would be better to integrate the checks into the main build/lint/test workflow, so that nits can be dealt with alongside other test failures?
I have seen one case where this was done well. In a previous monorepo we had precommit hooks which would run formatting with fix enabled, wrapped such that a failure would be silently suppressed. So our precommit hooks could never fail and interfere with developer’s work, but the code was always formatted.
The motivation for this was that we had probably a hundred plus developers in our monorepo. Someone or another was always having a misconfigured editor, or forgetting to run tests before pushing to CI, or whatever else. CI runs cost time and money. We couldn’t enforce every best practice but the team at least saw that fewer PRs were stalling due to bad formatting. (The usual flow was: person pushes commits, tags reviewers, wanders off for half an hour, then finally sees the failure and has to re-request reviews etc. Not a huge deal, but again, a problem that increases with scale.)
Oh ... We fixed the formatting, but we didn't actually stage the changes. The pre-commit hook runs on the working tree, not on the index, so it didn't catch the issue.
The benefits jj offers by removing this distinction remain immense. I don’t know that jj fix does or will do everything hooks do, but I’m certain it won’t have edge cases like mentioned in this post.
And a meta FYI: Jyn is absolutely already aware of jj, I mainly mention for others and to start conversation, not trying to patronize :)
i actually see jj fix as doing a different thing than hooks, it’s closer to git rebase -x. i’m excited for it! but i think it’s not quite comparable.
I hate pre-commit hooks for these exact reasons. Especially because I use git commit -p sometimes and lose my patch when it fails the hooks.
I love the idea of moving it to pre-push. That would make my life so much easier.
i don't even think pre-push hooks are a good idea. a push is a backup. we shouldn't block devs from backing up their work for any reason.
I feel less strongly about push hooks because you’ve at least committed the changes at that point, there’s a way to restore them if the hook breaks things somehow. —no-verify is annoying to pass each time but it does exist, and the consequence for forgetting it is lower when pushing (just “CI is delayed” instead of immediately losing work).
Tips for writing a pre-push hook
Nobody who's writing any hook follows these tips to the extent that they should. It's usually used as a mechanism to cheaply impose some rule or a single person's will on the entire team. Cheap of course only from the point of imposition. The cost in time and frustration is always enormous because the hook authors never take into account the real and oftentimes quite varied ways that people work in the real world.
Hooks never work at any kind of scale unless you can (and want to) create a finely tuned code prison for people to work in.
I guess --no-verify to skip pre-commit hooks is a hint that the hook is meant to only check the commit and abort “invalid” commits. While it technically can alter index it doesn’t seem like it was an intended use.
All your fomatting needs should be fulfilled before you stage your changes. I suggest an on-save re-formatting in your editor.
format on save is even worse! i don’t want any tool messing with the only mechanisms i have for “save my work”. the only form of this that seems good to me is a little squiggly line in the editor showing the formatting is wrong, and even that only if there’s a way to opt out.
I strongly dislike format on save, but I disagree that it's worse than pre-commit hooks. At least with format on save you never ever end up with something formatted that you didn't edit, nor do you ever end up with a diff that's "wrong" and needs to be fixed (and if it isn't fixed, will cause conflicts).
Zed has “Save” and “Save Without Formatting”. I think formatting on save is what you want 99% of the time so it’s a sane system.
The only VCS that exposes a FUSE filesystem of its commits is Sapling
Intriguing idea. I remember reading about this when looking at Sapling, and I categorized this as a performance feature for large monorepos (which I assumed were related to lazily populating the filesystem state from the network, things like that). Are there other nice features/workflows that the FUSE trick enables?
FUSE allows you to use a tracing build system: https://jyn.dev/build-system-tradeoffs/#tracing
I'm afraid I don't follow. I would assume that the tracing build system works on the working directory, which is materialized to the filesystem anyway. What is the relation between a filesystem interface for commits (or other VCS state) and tracing builds?
Sounds like the pre-commit hooks the OP is using would be better off as git filters (smudge/clean) that were actually designed for the use case of modifying the code being committed.
Also, unfortunately pre-commit hooks are really the only reasonable way to scan for secrets committed in the repo. By the time it reaches CI, it's too late–the secret is considered compromised and needs to be rotated. Even pre-push is not great, because you might have several commits lined up to push and now you need to redo this entire branch to rip out the secrets before pushing.
You can detect the state of some git operations to prevent the rebase issue: https://adamj.eu/tech/2023/05/29/git-detect-in-progress-operation/ then exit clean without checks on every rebase/cherrypick/revert.
those files aren't reliable; it's possible to run git checkout ref in the middle of a rebase and then forget you were in a rebase and just use a git repo like normal. i really really don't like it when tools try to be this smart, it makes the failure modes much harder to understand.
I don't think I've run into most of the problems described in this post, and I've been using pre-commit hooks for a long time. Maybe my teams are too small and our work habits are too homogenous for us to be exposed?
We also always use this set of helpers to manage and run the hooks, and I think it addresses most, if not all, of the sharp edges called out in the post. For formatters, in particular, if you attempt to commit a set of changes, pre-commit runs the hooks only against the changed files. If any of the formatters it runs changes a file, it exits with a non-zero status. You then review the formatting changes, test if necessary, and add them before retrying the commit.
I find it a pleasant workflow... I could probably be persuaded that pre-push is better, but it would be painful in a different way: I don't push after every single commit; frequently, I make several small commits then push as a group. If my formatters don't run until push, it's harder to amend the specific commit with the changes from the formatter unless it happens to be the most-recent one. I'd rather run them on commit.
my formatters don't run until push, it's harder to amend the specific commit with the changes from the formatter unless it happens to be the most-recent one.
for this, i often use git absorb combined with git revise. i think i could make it even less prone to merge conflicts by passing the right flags to git rebase -x 'cargo fmt' but i haven’t set that up yet.
Also, the pre-commit tool works if you only staged parts of a file (e.g. using git add -p or tig): it will only run on the staged lines and it will stash the rest like magic.
Oh, wow! I just tried and it really is like magic. I never noticed that before, even though I very frequently only stage parts of a file (using Jetbrains' git UI). That is very nice.
I setup a format check hook for the first time yesterday actually, and landed on git-format-staged since it's the only solution I'm aware of that doesn't touch your working files (that invalidates caches).
I'm using its --no-write option so the hook is a check only because by default it can change the commit's content.
It won't work for all languages since it runs the linter/formater tool with just the file content over stdin, which is not widely supported by tools that depend on more context like neighbor files.