Git blame for code comprehension

14 points by christine

I love it! I have also always thought of it as "spelunking" (which the linked-to Every line of code is always documented post calls it), and I have a similar "keep using git blame to go back in time, layer by layer (commit by commit)" workflow. (Dear Magit: I love you.)

Separately...

My default approach to reading is “predictive”: I don’t actually read the code line by line. Rather, I try to understand the problem that it wants to solve, then imagine my own solution, and read the “diff” between what I have in my mind and what I see in the editor. Non-empty “diff” signifies either a bug in my understanding, or an opportunity to improve the code.

...I have a friend who is a professional composer, and he once told me that he enjoys listening to music the first time by imagining what the composer is going to do next, and then comparing what happens to his own guess (the "diff"). I had never thought about doing that to code!

fanf

Dear Magit: I love you

I think magit is capable of doing this kind of spelunking, but tbh I find its blame mode baffling and its documentation incomprehensible, so I usually end up using git log on the command line instead. https://docs.magit.vc/magit/Blaming.html

mira

An interesting conclusion from this: Because history is crucial for comprehension as the article argues (as well as that knowing the base commit id generally being needed for upstreaming changes), for most open source projects the repository is the preferred form for making changes to it – aka what the GPL defines source code to mean. So if you use GPL code, as far as I understand, distributing only a commit's file tree without its history is not enough.

fanf

for most open source projects the repository is the preferred form for making changes to it – aka what the GPL defines source code to mean.

Yeah, this is the point of Debian’s dgit and tag2upload projects (18 comments) (see also)

coxley

Somewhat adjacent, I wrote coxley/link2code for generating Github line/blame permalinks based on the most recent commit that is also upstream. The README includes the neovim machinery I use to stuff it in the clipboard or open via the default browser. The implementation is rough but works well in our monorepo.

I'm should extend this to support git blame -L...

nposting

I've been trying to lean more on git log -S/-G (which shows diffs which add or remove matching text) in these cases too. It's saved me a bunch of time understanding how a given piece of code became dead when it's more scattered.

isuffix

Another important part comes in the original writing of the code. Well-structured code should explain itself, but should also be aided by comments explaining why it is the way it is. A single comment has far more potential to explain the "diff" between the code and your prediction of the code, and requires no archaeology to find.

joshka

Ideal code is memoryless — it precisely solves the problem at hand. Most real code is Markov — the shape of the code at time T depends not only on the problem statement, but also on the shape of the code at time T - 1. The 3D step is to trace the evolution of code over time, Where Do We Come From? What Are We? Where Are We Going?.

Strong disagree. I used to think that all you needed was code, then it was intention revealing names and obvious conventions. But I've come to prefer that code carries enough documentation in a way that avoids having to ever actually read the code. We have several ways in modern engineering to think about the code: Docs that surround it (PRDs, PRs, Commit messages). These are outside and have an O(N) access pattern (sometimes with a network hop as a multiplier of complexity). Docs which are collocated with the code (module, type, function doc comments, inline comments etc.) have the benefit of being O(1) access, which is significantly better for discovery and use. The old "Docs drift" response to this is no longer valid in the age of LLMs. You also have a last place for this that is documentary in purpose and that's unit tests.

Changes don't have to be recorded as 'yy-mm-dd changed it...' but it's useful to have a historically relevant perspective in the code that captures that there was some reason for doing things a certain way that has changed to meet requirements over time as it provides context of what to do and what not to do. Similarly unit tests that capture bug regression tests etc.

AviKav

I think the author's point is that these are all methods, tools, and practices to deal with the reality that the problem as it currently exists and the solution as code applied currently, tend to mismatch.

In that respect, an omniscient oracle (not LLMs) that automatically applies the ideal solution with respect to the current problem (including contexts historical and other) might be considered superior.

tomhukins

I find tig grep more effective for exploring code and its history than git grep. Using tig means I don't have to write my own commands or customisations.

tig provides a keyboard driven terminal user interface and its manual describes the keybindings it uses: its blame view uses , to view a parent commit and < to go back to the child.

christine

I'll echo a question asked in the blog post:

Is there some git command that tells me directly “what’s the equivalent of $file:$line:column in $sha-A for $sha-B?”

polywolf

This is something I do a lot at my job yes. Both "making sure to write good commit messages/PR request bodies" and heavily using the GitHub UI for exploration. I think it's good advice!

natfu

I never had an occasion to use it because it's too niche and work uses git ofc but I also believe having your discussions locked into a proprietary platform is bad, so I'd like to mention the good old fossil

matklad

This comes up every time, but, as far as I understand, fossil doesn’t have any code review features at all, there’s no equivalent to PRs. Am I missing something? How do people using fossil review and discuss patches?
- natfu
  
  I've not used it, from what I understand you have your tickets that are stored in your project's DB, you can comment on the ticket, etc. Then you can fork and merge so I would expect the discussion to happen in the tickets? I think it matches the workflow of the (small) SQLite team and they also use mailing lists.
  
  Now, I'm not saying "use fossil" but it's a step in the right direction: your discussions, comments and PR/MR equivalents should live in some format (SQLite or otherwise) that you can easily migrate.
matt-y

Comment removed by author
matt-y

Does anyone have git blame aliases for the cli?

I've been using an alias for git blame -w -C -C -C but I feel like there are probably more tricks I don't know.