Chesterton's middle finger
108 points by carlana
108 points by carlana
I would add another variant. Sometimes it's not obvious at the time what will be important to know in the future. If everything leading up to the commit is recorded publicly then that can be a big help. But even then some things may seem "too obvious" to mention (because everyone involved already has a lot more context than the future person does). If discussions are not recorded then doing digital archaeology on the decision making process can be much much harder.
Whatever the reason, sometimes you will have to deal with not knowing why the fence is there. You have to evaluate it in its current context and bring people's current knowledge of the system to bear. It helps a lot to have people who have worked on the project long enough to gain understanding and intuition about the system as a whole. I think this is the risk for organisations that treat developers as interchangeable cogs. Nobody is there long enough to gain that kind of understand and reinventing the wheel ensues (making the same mistakes over and over).
This is the number one benefit I get from code review: someone else reads the code and makes me put comments in for everything where the rationale wasn't obvious to them. I already put in comments for everything that wasn't obvious to me, so now we have comments for everything that one of two people thought needed explaining. The next person coming to the code has a non-zero chance of understanding why things are the way they are.
A bit like old food recipes with items "everyone should know where to find" or medieval rhythm/beats "everyone should know" and now we don't what was in there anymore!
I never understood developers who just put "fix" or "WIP commit" or some such in their commit messages. Presumably either they never had to do any serious code archeology or it never occurred to them one could even do that.
I always (try to) err on the side of too much info, so that at least future me (or my successor, or the poor sod who gets the call when things are on fire) has a fighting chance to figure stuff out when it inevitably breaks.
I never understood developers who just put "fix" or "WIP commit" or some such in their commit messages.
It's probably the wrong level of granularity. For some devs, a single commit may be one of a hundred small steps toward adding a single "marketable" feature. For them, it might be hard to come up with meaningful descriptions or even mentally keep track of what's changed since the last checkpoint, which they may be treating more like a "save" button in a video game than a way to divide the work into well-defined units.
Conversely, if you're making a big commit that's a result of a big API refactor, anything short of a design doc is probably not enough, and if you're not doing that, what's the point...
But then, some folks start wearing it as a badge of honor, start putting "bug fixes and feature enhancements" in release announcements, etc. And that's obviously the wrong takeaway.
For some devs, a single commit may be one of a hundred small steps toward adding a single "marketable" feature. For them, it might be hard to come up with meaningful descriptions or even mentally keep track of what's changed since the last checkpoint, which they may be treating more like a "save" button in a video game than a way to divide the work into well-defined units.
At work we have a few people who work like that, but then they rebase it and squash the related commits and write up a useful commit message. If you're really good with git you can then even split up the resulting commit into meaningful units.
If you're really good with git you can then even split up the resulting commit into meaningful units.
If anyone is in the market: I can't recommend lazygit (a TUI for git) enough for slicing and dicing commits.
Makes things like 'add a few lines onto that commit 5 levels down' or 'pull out a few lines of changes from that commit into another one' or 'change the order of these commits' very very easy.
(I have heard jujutsu is also good in this regard but have no experience)
I was intrigued by lazygit, then promptly put off by the three unrelated promotional banners on the README, as well as most commits being co-authored by Claude...
If you're really good with git you can then even split up the resulting commit into meaningful units.
I just copy/paste stuff out of large commits to split it out. Sometimes I just copy the modified files I want to split out to ~ or /tmp, switch to new branch, and move them back.
It's embarrassingly simplistic. But it works. And has lot cognitive overhead, no chance of losing stuff, and not all that time-consuming. I never got to grips with add -p and such and I doubt it'll really be much faster.
Theoretically it's supposed to be easy with magit, but I always trip up when trying. When I have to split a hunk I usually start scratching my head and just give up trying to do it the "sophisticated" way. Because I know I'm a total failure at splitting commits, I try to make atomic commits from the get-go.
Of course I'm far from perfect, so I end up going back and rebasing a lot, basically fleshing out my existing atomic commits.
It is never easy. Even in jujutsu is not easy, because their split editor is the most unintuitive TUI I can recall using. The least painful way I've found is to:
git restore to checkout changes from the commit you want to split into the working tree.With magit the whole process takes ~ a minute to do this process, unlike regular operations which are almost at the speed of thought.
In the scenarios where there is a single commit in the branch one will have to first add an empty commit and move to be before the commit we want to split. And it step 4 one has to commit --amend the change instead. This is because when there is a single commit in the branch the previous commit is a public one, which one shouldn't include in the rebase range.
Something that I used to do a lot was that before each commit I would do a git diff --staged and skim through, and then bullet point each change in the log and why. It takes maybe 5-10 minutes to do but it is a huge help with commits.
If you set this in your Git config file:
[commit]
verbose = true
then when you open your editor to write the commit message, the entire diff will be present at the bottom of the commit message file. This might save you from having to flip back and forth between your editor and diff tool!
Careful with that, though: if your diff is sufficiently large it can make git and/or your editor choke. In a project we had hundreds of text files checked in as tests (it was something like a compiler, the text files were something like AST output snapshots) and for many changes verbose would take on the order of seconds.
Often I've found the true commit history in many companies ends up in the pull request on github/other forge and the associated comments etc... This isn't good for discoverability for obvious reasons.
There's that, and also it risks losing valuable history when switching forge software. I mean, you'd always have that problem, as you'll often end up in the review anyway when really digging into something, but if your only source of truth is the review, it makes things many times worse.
On the other side when I get a 100 line PR comprised of 68 commits, it might as well be one giant commit reading "do the thing" for all the good it does me.
A lot of developers often forget that the "other person" that might need to read and understand the story behind a change could be themselves in the future. I think we're all familiar with the situation where you scratch your head looking at a block of code, git blame, and then see your own name staring back at you. I can't count the number of times a good commit message saved me from hours, if not days, of digging around issues, emails, and chat logs trying to answer the why, even in cases where I'm supposed to be the person who can answer that off the top of his head.
Be nice to your future selves. Dump everything you know, everything you were thinking, everything that was discussed, into that git log. It's never obvious what will remain obvious 5 years down the line.
Dump everything you know, everything you were thinking, everything that was discussed, into that git log. It's never obvious what will remain obvious 5 years down the line.
Though of course, some of that knowledge may belong in comments, or a design document instead of the commit logs.
I'd argue that even if a design doc exists, it's still worth summarising it (or parts of it) in the commit message. Unless the doc lives in the repo, or you're sure it will always be available somewhere where you can link to it. In my case, I work in open source, but the project is mostly driven and developed by our team and doesn't have many external contributors. Design docs can often be internal and private for various reasons, but the commit message will always be available with the code.
I do prefer it when the design docs are just Markdown (or plain text) that is inside the repo.
My point was more that critical information shouldn't just be in a commit message. But that's fine if it is also in the commit message.
I think we're all familiar with the situation where you scratch your head looking at a block of code, git blame, and then see your own name staring back at you.
Somehow, for things where there was a reason at all (random counterparty quirks are random, but it's hard to say anything useful without visibility into their processes), I am not really familiar with that part. If it ever made sense to me, I basically need to reread it, and then it makes sense again. I guess I do lava-layer learning, I do not really change (after highschool maybe?), I just add things I know and methods I could use.
Even if you know why the fence was built, you don't know why it's there. Even if you are the builder of Chesterton's fence, you don't know if you can tear it down.
You see the intended interdependency tree of a system at the time it was created is a subset of the actual interdependency tree at any given point in time. So even a perfect developer who told you everything you ever wanted to know about why the fence was built is of limited utility. They know why they built the fence, which is not the answer to the question "what will break if I remove the fence?". See: https://xkcd.com/1172/ but understand that in addition to the silly usecases that arguably don't matter, there will always be valid uses the original developer themselves couldn't.
So yeah it's nice to know what the og developer was actually thinking or smoking when they built something, but it's also somewhere between incomplete and irrelevant.
Made up example: Chesterton's fence was built to keep a toddler away from a big trough of water back when there was a farm on each side of the fence. But now there's a freeway and chesterton's fence is accidentally the only thing preventing a massive number of animal and people deaths due to cars hitting deer. Nobody actually knows this because the freeway plus no fence configuration was not only not tested, it never existed and the freeway builders + Department of Natural Resources have no idea why their freeway is so roadkill free. In a few years when it's moved from farms to all houses and it's not a major animal migration corridor maybe the fence will be useless. Or maybe not.
If this seems far fetched or inapplicable to you: I envy you. In my experience this is how things work at every moderately sized company that's been around longer than a couple heartbeats.
The truth is everything you do is part of an ecosystem where you depend on and are depended on by things you never explicitly agreed to interact with. You can narrow your API surface area and avoid making all your implementation details your neighbor's business, but unintended coupling is basically a law of the universe as unavoidable as entropy increasing.
To some people this statement seems nihlistic and defeatest. Don't you have to fight entropy? I'd argue that the best use of time and the best ROI you can get is to acknowledge that it's fundamentally something you manage not something you fight. If you start pretending like you can know the state of the world at all times you are setting yourself up for failure and alot of self flagellation. (See: 100% uptime doesn't exist and is basically the wrong target for everything). If you acknowledge you're managing a process that has some inherent heisenberg-uncertainty properties, you can make effective use of the limited number of hours you get per day to decide how best to create the best outcomes. In particular you can make intelligent tradeoffs and proactive vs reactive with an understanding that you'll never get reactive to zero and in fact sometimes it doesn't make sense to spend a year on something proactive to avoid a day of doing something reactive.
So what's the right amount of documentation on a commit? How many design docs or test plans should exist? F#@! if I know. I'll throw out one option: every document is written for an audience. Probably if you make a change to your codebase, your current teammates including the new hire, should be able to, with research, understand what your change did, why you did it, and contain a couple surgeon general warnings about any footguns or load bearing bugs. This shouldn't be in the form of exhaustive prose but generally pointers to additional context that set the stage. "Hey requiring auth on this step was part of the make sure every change has multiple party signoff see: go/multiparty".
but unintended coupling is basically a law of the universe as unavoidable as entropy increasing ... Don't you have to fight entropy? I'd argue that the best use of time and the best ROI you can get is to acknowledge that it's fundamentally something you manage not something you fight.
I would go as far as to argue that entropy increasing and unintended coupling happening all the time are not actually bugs, but useful features that prevent systems (all of them) from locking up in permanent states.
I am actually really uncomfortable around systems that strive for perfection in face of humans. DRM, trusted computing, remote attestation, Faro Plague, smart contracts... I much prefer systems you can reboot into service mode and modify.
Mainly because you just cannot predict in what direction will the software have to evolve in the future to actually help people. Better make it easy to modify than 100% tight.
It's a welcome novelty to see this mature of an understanding of complexity and change. I have been trying (with mixed to poor success) to communicate this idea for years.
Someone, annoyingly I can't remember who it was, recently argued that commit messages longer than a sentence were generally a waste of time, and I was struck by both how much I wanted to argue with it, and how bad I was at demonstrating the opposite.
One question is: which pieces of information belong in a commit message, as opposed to an inline comment, as opposed to an ADR, or other long-form documentation?
I still try to produce good commit messages, though it's a lost cause at work, and I'm inconsistent on my toy projects.
For me, the commit message is for the reviewer. It tells them why this change is necessary (what it fixes or why we want the new feature) and gives them framing for understanding the change. But the resulting code should be understandable without reading the code: if there is something that needs rationale in the new code, that belongs in a comment.
Or, to put it another way: the commit message is for explaining why you've made this change now, the comments are for explaining the code is in the state it is at the end.
For larger changes (especially new features) there should be a design doc somewhere. This may be a real document in the repository (especially if it needs review) or it may be in the issue tracker. The commit message should refer to this, but may also need to explain details of how the design was lowered to code. And it should ideally include a short abstract of the design document. This is especially important if there are multiple potential reviewers because it signals which of them should be most interested in the thing and helps catch cases where one person who should have provided feedback at the design stage didn't.
Or, to put it another way: the commit message is for explaining why you've made this change now, the comments are for explaining the code is in the state it is at the end.
Beautifully put.
My general approach, as a driver for what goes in a commit message, is that my usual way that I'm looking at a commit message is not via git log (or jj log or whatever), but rather, almost always via line annotations. Having the subject line is generically extremely helpful for knowing whether I need to dig in more; having some information in the body about why the change was made can also be helpful if it's unintuitive. (E.g., "admin: add impersonation", as the entire commit message for what I'm sure is quite a bit of code, is honestly plenty, but "auth: shorten JWT timeouts", I'd love at least a sentence or two on why the timeout needed to be shortened.)
My personal take on the truly long-form commit messages is that they are indeed pretty useless in practice, largely for the reasons that article pointed out. I think that form stems more from workflows where the commit messages are the PR description (e.g., email-based workflows, Gerrit, etc.), in which case I don't think they're harming anything, but I don't think they're necessarily adding value, either.
I wrote a post a while back attempting to answer this exact question! I put the different locations in a hierarchy with rules of thumb for sorting information into the locations.
I’m in the same position as you. Only one other person and myself in my wider group at work write detailed commit messages.
We almost never write commit bodies but we are pretty good on subjects. So if that's a measure, I don't know of what it's supposed to be one.
Often clear code and sufficient test coverage are much more useful than documentation in larger codebases anyway.
I disagree. Sometimes both the actual change and the reasons for the change are subtle. When you're figuring out why a weird thing is the way it is, looking at the tests which were changed in the commit is an extra step and doesn't necessarily say why it was changed.
Sometimes a perfectly good change is made, tests are updated to match the change and in retrospect it's entirely unclear why it needed to be changed, if the changed line causes some unexpected (or additional) behaviour in prod. Simply reverting might not be desirable, and then it can really help to have the full history of why the change was made.
I've seen plenty of cases where the idea was correct, but there are unexpected consequences. Then, knowing the intention can help you derive the actually correct change that fixes both the new problem and the maintains the original reason for the change.
If you insist on having only one commit line, at least make sure you include a ticket number so one can read up on the history there.
then it can really help to have the full history of why the change was made
It could help but in the heat of the moment for many developers this all seems to be a bit too subtle.
make sure you include a ticket number
Most changes are definitely traceable to a ticket but I've always assumed that commit bodies will never be seen and write them that way (extremely seldomly). I've also never really seen a practice of commit bodies with valuable information anywhere.
It could help but in the heat of the moment for many developers this all seems to be a bit too subtle.
It saved my bacon a few times, even when in the midst of debugging while things were on fire. It helps if you have decent editor support for "chasing" the git blame output down to the commit when the breaking change happened. Of course, you need to know you can rely on good commit messages from your team, otherwise you probably wouldn't even attempt this.
Git bisect can also help, but that's more useful for a post-mortem or when the bug isn't so critical that you have to drop everything (unless other approaches fail). Then read the commit message of the culprit in full to get a better understanding.
I've also never really seen a practice of commit bodies with valuable information anywhere.
Most (larger, mature) open source projects actually have pretty good commit practices (Linux kernel, the BSDs, PostgreSQL are good examples). More generally, any place using code review will likely have better commit messages too, as feedback on the commit message at least should be part of the review.
Most of the Node jockeys around me don't really know that git blame and git bisect are commands that exist. I've had to teach bisect several times over the past years.
feedback on the commit message at least should be part of the review
This must be a beautiful world but over here in industry I'm happy enough if there's something in the subject (even though I suspect most of those are generated by AI these days). It's a problem when nobody in the team speaks English at a near native level. Any amount of writing seems to be a struggle at that point.
We have a group of non-programmers who have to check their stuff into git and more often than not their commit messages are ".".
Most of the Node jockeys around me don't really know that
git blameandgit bisectare commands that exist. I've had to teach bisect several times over the past years.
I see how you got in a situation where people don't see the value in decent commit messages. Sorry to hear!
This must be a beautiful world but over here in industry I'm happy enough if there's something in the subject (even though I suspect most of those are generated by AI these days).
I'm also in industry, but so far either my colleagues already knew how to write proper commit messages from being active in open source, or took a page from my book and saw how useful a good commit message can be (and overall, just how professional it is). I guess I'll have to count my blessings (again)!
It's a problem when nobody in the team speaks English at a near native level. Any amount of writing seems to be a struggle at that point.
Oof, that sucks.
We have a group of non-programmers who have to check their stuff into git and more often than not their commit messages are ".".
Well, hopefully they're not touching actual code files, so that shouldn't be a huge problem when git blame-ing.
Well, thankfully, slopgens write massive commit messages (often even having some bearing on the actual change), so at least that part's solved
Is it? They might be decent at summarizing what changed, but not why
In my experience lately, if you've already discussed "why" when working through the implementation plan (which you certainly need to do), Opus seems to do pretty well with propagating a concise version to the commit message.
As a special case, if Opus found the problem in the first place, it usually does a reasonable job:
Fix db-admin-panel test id on Database page
DatabasePage wrapped the backup/restore panel in a Card with a
`data-testid` attribute, but Card only forwards its typed `testid`
prop, so the attribute was silently dropped. The panel rendered
correctly (canDumpDatabase resolves true for admin), just without the
id, so database.spec's panel assertions timed out.
Pass the panel's id through Card's `testid` prop instead.
That's an interesting example because it was a one-line fix where the "why" was not obvious, and a human like me might well have just said Fixed flaky DatabasePage test.
For five years I traveled to exotic locales and made decent money rehabilitating these sorts of codebases. @arp242, always raise your rates and sleep with https://archive.org/details/working-effectively-with-legacy-code under your pillow.
Sometimes Chesterton's Middle Finger is strict corporate policy. I worked at a place that was unifying on a VCS called "Serena Dimensions" circa 2005. Dimensions had crippling problems of its own (some files had diffs immediately upon checkout, among many difficulties), but the policy was to throw away all previous VCS history when migrating to Dimensions. My project lost years of CVS check-in commentary when the corporation moved to Dimensions.
it is the first time I’m hired after everyone else left. There is no one to ask.
I presume everyone leaving was coming for a while. This may be a symptom of systemic workplace issues.
No; just a (very) small business that does not have software development as its core business. It's the most friendly workplace I've been at in a good while. Don't make assumptions.
Unfortunately having good commit hygiene, design docs, why comments. None of this get more money or promotions. Some of your colleagues will like it. But im not sure they will give that as peer feedback 😞.
Managers either count commits or tasks closed (or some other useless metric)
This is really a sad state.
Don't let management hinder you in doing the right thing. Good commit hygiene and comments don't cost anything, and if you can convince your team it's a good thing to do, just do it. And if not, just do it yourself as a point of professional pride. Such things tend to spread like a benevolent virus anyway, in my experience. When things are up to a point where it's becoming the norm, those who drag their heels will start being seen as backwards and stubborn. People with such a reputation certainly won't get money or promotions!
Good commit hygiene and comments don't cost anything
Actually good ones cost time, with no guarantees that any use of them will happen before you leave.
People with such a reputation certainly won't get money or promotions!
People in somewhat bad places (not necessarily clear unambiguously bad grim-and-dark places) get promotions despite bad reputation among peers, if they manage to match that one thing management actually cares about getting from them. If the management doesn't want to invest a bit of time into documentation, the place might be bad enough for that…
This is why my workflow usually involves pasting the commit hash into github search and reading any discussion on the associated pull request. And why I usually add a pull request template as a writing prompt for the "plz plz" description crowd.