Code like a surgeon
12 points by simonw
12 points by simonw
There is a false dichotomy here where either you let the LLM "do your grunt work" or assign it to a person.
Before LLMs, there was a way to delegate grunt work without assigning it to your lowest-status team members, and that way was automating toil. Script it. Build a tool. Take the time to build a deterministic fix for the problem. It does take a team culture of valuing that sort of work, but it pays dividends amortized over the life of your team.
It's hard to imagine a surgeon who, during surgery, would have to spend so much of their mental energy ensuring they prompted their staff properly and then continuously double checking their work and iterating, and I'd hate to see the mortality rate at that hospital.
I'm always left wondering what tasks articles like this mean when they say "grunt work" because it's very hard to think of examples where it wouldn't be better to fix the actual problem resulting in grunt work.
I'm working through a whole bunch of grunt work right now. I've made a large scale design change to my permission system, but the previous permission system was backed by literally hundreds of tests. I need to go through and either fix the implementation to match the tests or identify which tests are no longer valid against the new design and need to be either updated or removed.
Even a few months ago this would have represented days if not weeks of work. Today I have Claude Code and Sonnet 4.5 crunching through them and it's honestly making extremely high quality decisions - I'm reviewing what it's doing and course correcting occasionally, but it looks like several days worth of work is now going to take me a few hours.
Splitting the database the company has used for the 8 years of its existence into a prod vs. analytics db. Making all the decisions about which table goes where, adjusting all the queries in the code to point to the relevant db, reworking any queries that would cross the boundaries. A very tedious task that just requires a bit of judgment.
Replacing the Julia <> Python interop library from 2021 with its successor that's much more stable but requires you to specify the output type at every call site, instead of attempting to infer it.
Translating a bash script to your company's main programming language and integrating it with the codebase.
Analyzing why your RDS costs suddenly spiked when you're using the same number of instances and the AWS billing messages are completely unhelpful (and the different dashboards contradict each other).
Debugging the dozens of issues that appear when migrating a large Python 3.8 codebase to Python 3.13
You're describing exactly how I use LLMs right now: I use them to automate the toil.
Sometimes that means having them help me write a custom script or tool - I never learned ast-grep or Codemod but the frontier LLMs are all very capable with them.
Other times it's using the coding agents as a super-fuzzy version of grep that can run the unit tests and make modifications and keep on going until they start passing again.
I've been refactoring so much code recently, because what used to be "ugh, I should refactor this, but that's 30-60 minutes out of my day and I'm behind already" is now "I should refactor this, I'll chuck Claude Code at it in a worktree and see what it comes up with in a few minutes time".
but the frontier LLMs are all very capable with them.
In what ways do you feel adequately capable of assessing the usage of tools you've never learned?
Because I can run the resulting scripts and see if they do what I needed.
If they don't then the models clearly don't know them. If I try a dozen different things and get a dozen working results that's a strong sign that the models are capable of solving problems with this stuff.
I don't need world-expert level knowledge of ast-grep from a model, I need enough competence to get stuff done that I wouldn't have been able to myself.
Write documentation about what I’m building
Documentation that nobody will/should read.
An interesting aspect of documentation is that it often describes semantics not covered by your type signature.
I’ve found when working with LLMs, when they change API documentation, it’s good to read those diffs extra carefully: sometimes you may have thought you were doing a pure refactor, but you actually subtly broke something.
After swearing that I would never let an LLM write human prose for me (including documentation) for years, I've finally relaxed that policy recently when it comes to technical documentation.
It turns out that there's a lot of technical documentation which an LLM will do a better job on than I will, because it has the patience to type out all of the repetitive and somewhat boring details.
Here's some API documentation that I let Claude write for me recently: https://github.com/simonw/llm-prices/blob/d3e76471410b310dd17392ab05ff079db4cda0f7/README.md#json-apis
It's boring, useful and entirely correct.
I definitely don’t like it for most writing, but yeah, API documentation has been okay for me.
Those who haven't read "The Mythical Man-Month" are doomed to reinvent it.
In this case Geoffrey has read the Mythical Man-Month:
The “software surgeon” concept is a very old idea – Fred Brooks attributes it to Harlan Mills in his 1975 classic “The Mythical Man-Month”. He talks about a “chief programmer” who is supported by various staff including a “copilot” and various administrators. Of course, at the time, the idea was to have humans be in these support roles.
Lol, came to post this. So few people bother reading it, and these days it feels like few have even heard of it. I always very much liked the Surgeon analogy, though it seemed to get no traction in industry at all. Perhaps now it will. Brooks has another amazing book that even fewer people have heard of, let alone read, called The Design of Design. Well worth the read: https://www.amazon.co.uk/dp/0201362988