Using AI to write better code more slowly
74 points by nolan
74 points by nolan
I have given up the dream to move faster with AI in my specific workplace. Coding isnt the bottleneck in our case. What I do like about coding agents is the fact they allow me to be the engineer Ive always wanted to be.
Some examples might be building a proper testing harness to push the code a little further, add a CI step to validate generated code is aligned to source or properly monitor a deployment of a change.
This is the future I tell you. Any one of these would have thrown me into something I couldnt afford due to timelines like reading Gitlab CI manuals to understand how to build the conditions right and figure out the crooked way we do it.
i think this explains a lot of the times me and my interlocutor have been talking past each other: different definitions of "moving faster"
i've been including writing tests, refactoring, fixing build tooling, etc. as part of "moving faster"
but, it makes sense that some people think of "moving faster" exclusively as adding features, lines of code, etc.
i don't think your comment will change how i use LLMs (because i've been doing that). but it will probably change how i talk about it. maybe "editing faster" or "refining faster" is a better way to talk about those types of things
I've found a great deal of success with pushing LLMs into the 'API aware spike partner' and 'Mechanical Refactor Machine' -- especially in highly typed languages. They're also great for writing tests if you have a multilayer process that makes sure those tests have some constraining power (mutant testing seems to get me a good ways there, as well as multiple review passes, as suggested by the OP).
I definitely have had a much more negative opinion about LLMs in the past, irrationally so in retrospect, but that was borne mostly out of the firehose of slopware they were producing. Once I dug in and started to treat them more like a tool for cardboard-prototyping and as a much faster typist (it is a godsend to be able to say, "For every theorem in this Lean project, identify <this> pattern and replace it with <this> one instead, flag anywhere it doesn't work out of box and give me a list of stragglers" and have it chunk update 100+ theorems in about the same time it would've taken me to write the first or second iteration of whatever unholy combination of vim, sed, awk and chewing gum I would've put together. Lean is particularly nice here because there isn't much room between 'Compiles' and 'Works' by nature of the language and the stuff I'm doing in it; but I get much the same experience working with Rust and a good test suite + mutation.
I definitely think the long tail of these tools isn't 'push button receive product' -- good engineers should be able to incorporate them to accelerate and focus their energy on the stuff that matters; and delegate a lot of the stuff that used to be scutwork to the machine.
FWIW I also had an extremely negative view of LLMs at first, but they've gotten good enough to where I think they can be more of a help than a hindrance.
Your example is interesting because in the past, when I worked on a JavaScript framework team, I wrote custom codemods for this kind of stuff (upgrades, migrations, etc.). It was painstaking AST-mangling stuff. Nowadays I figure you could just ask an LLM to do it and get 90% of the way there.
The delightful realization that I wouldn't necessarily have to slog through the parser extensions I wanted to do in Lean (having had done some smaller ones by hand) was pretty much the final nail on "Welp I guess these aren't going anywhere."
The ecstatic realization that about 20 minutes of skill writing would get me something that could manage all my jira tickets so my PM never bothers me again was when I decided I wouldn't hate all of the time I was using them.
The true bliss was when I learned a teammate already wrote the skill.
I'm curious what this skill looks like? How does that relate to tickets?
I don't think I'm allowed to share it (it's a $work product), but at a high level it:
The goal was 'keep our PM happy while not having to interact with Jira directly' and it works pretty alright for that. We are not the deepest users of Jira (it's a straightforward kanban system for a small IT team), but we are very documentation-driven because of industry (ISO-certed medical-adjacent). It's certainly a bit more verbose than I prefer, and so I usually still have to zhuzh the updates a bit, but it's now 10 minutes in vim instead of 30 in jira, and no muscle-memory esc presses closing the jira window and eating my update. Big win.
Meta: The flagging on this post mystifies me. 1 off-topic, 3 spam -- erm, what? The top post on the front page is also about LLM use (and arguably less on-topic because it's about general writing, not focused on coding), and that doesn't seem to have any flags.
I’m pretty sure there are a few people who just flag anything positive about LLM usage as “spam”. The whole tagging/flagging situation around LLMs is a trainwreck.
It's probably getting the spam flags for self-promotion.
This, and the tendency of some folks to use flags as down-vote and the overtly contentious nature of the post's topic. Abuse of functionality. That said: OP comments and stories are largely on self-authored submissions, but I'm certain mods have monitoring for this.
The mods have pointed to the use of spam flagging to flag as LLM-generated (https://lobste.rs/s/9hiu10/bun_s_problem_may_be_developing_open#c_duvfhx), so I guess maybe people have somehow generalized to anything that talks about LLM usage?
I haven't seen any justification for the off-topic flags, though. I wish the rules here were clearer!
The most recent update to the rules on flags says:
Spam which should be used for content that either is designed to promote a commercial service or for content that is created without meaningful human authorship.
It's fuzzy what a "meaningful" amount is, which leaves the flag open to abuse. And in this case, I'd say the flag is being applied incorrectly, because I suspect they used the flag as a "I don't like this" downvote instead of an honest assessment of, "I think this blog post was substantially authored by an LLM." (But maybe whoever flagged it actually thinks that. Who knows.)
Given how contentious the topic is, I think the mods did an alright job. Somebody would have been outraged no matter what they did, but at least now there's a definition!
The “Less than 1% of stories or comments get flagged” should be updated too, surely that can’t be true anymore.
It's refreshing to see this kind of take on lobsters. The blanket anti-Al sentiment is growing tiresome. No one likes slop, we can probably all agree on that. But those who chose to boycott AI completely and take a self righteous stance against it will have a harder time accepting the future than those who took a more pragmatic stance.
I've been saying it from the start. AI is like the invention of a power tool. You want to change a tyre with a hand wrench that's fine, but when the impact drill was invented you didn't see mechanics boycotting. Agreed, not the best comparison given the context of the article, but still.
I've learnt more using AI than I have reading docs. Because I can't ask questions of the docs when I need further context, explanation or examples. I could just say "build thing, make no mistakes" but I prefer the slow approach because I actually learn things.
I've been saying it from the start. AI is like the invention of a power tool.
To me, an IDE is a power tool, the impact driver to the text editor hand wrench. The LLM? I can't walk into a mechanic's garage and tell the tools to investigate the weird pinging noise I'm hearing and fix the issue, but the LLMs are being sold as if they can (and in some cases, they can). It's a tool unlike anything else that has been invented.
I've learnt more using AI than I have reading docs.
Sigh. I learned programming on my own in the 80s as a kid. None of my friends had a computer. All I had was books and magazines, having to learn to read between the lines, type in the code by hand, debugging issues, with no one to talk through issues. I feel like Roy Batty: "All those moments will be lost in time, like tears in rain."
I... get it.
But also, there's everyone else who didn't do that. People who frequent lobsters already know how to learn (and yet - you're only ever a specialist in so many things and LLMs help me be more "T" shaped). If it raises the lower bar for the general populace, I'm all for it. (Let's ignore the kind of person who would shill on linkedin prior to LLMs and would happily stamp their name on AI slop now - we happily ignored them before).
Most the analogies around LLMs are flawed - it's too different to things that came before. Intuitively interactive, but completely unaware. Memorised almost everything, but can learn almost nothing. Etc.
I haven't seen blanket anti-AI sentiment here. Could you link to some examples?
What I have seen is sentiment critical of LLM-driven changes to millions of lines of code at once, which are then deployed with no human review (one specific examples: the thread about Bun's Zig to Rust port). This article criticizes that, too.
Here's one recent comments section - I'll link to the invocation of Godwin's law: https://lobste.rs/s/szi49u/ai_slop_is_killing_online_communities#c_38op7z
i think focusing on the “Nazi” aspect of that comparison is severely misleading: it is more directly invoking the paradox of tolerance, which would not morally equate LLM boosters to Nazis. rather, it is making an argument that the nature of “LLM boosterism” has a tendency to crowd out and shut down dissenting viewpoints: see the “ride the wave or drown,” “don’t get left behind” commentary.
you could argue whether LLM boosters or detractors have a stronger tendency towards intolerance, but i think dismissing that comment as saying “LLM boosters are Nazis!” via Godwin’s law is missing the point of the comment and the law itself. does that comment trivialize the holocaust in any way? from Godwin himself:
Although deliberately framed as if it were a law of nature or of mathematics, its purpose has always been rhetorical and pedagogical: I wanted folks who glibly compared someone else to Hitler to think a bit harder about the Holocaust.
Godwin's law simply suggests the conversation's direction is tiresome.
It's not helpful or constructive to go out and call LLM boosters or detractors intolerant, especially not in a discussion forum where we are free to, you know, discuss the merits. At some point we have to see LLM generated garbage and, rather than yelling "It's LLM generated" have something actually useful to say on the content.
Or don't engage. That's an option too.
Yes I like this! It does seem obvious to say the tools are flexible, and you don't have use them to create slop ... But yeah both proponents and refusers tend to ignore that viewpoint
I actually haven't tried using LLMs to review code, but I guess I'll put that on my TODO list ... (so far I use them for idea generation and help with say SQL and VimScript, and then write the code myself)
One danger is that code review is a skill, and you could atrophy those skills by leaning on the models too much. But it's also true that in a commercial environment, even the best code review tends to be "a reasonable amount of time" + "do I trust this person" ... not anything approaching mathematical correctness
That's fair, although I've actually found that this workflow stretches my code review skills, since you have to gauge whether a "bug" is really possible or just theoretical, whether it's worth the effort to fix, whether it should be delegated to a later PR, etc.
For complex bugs I also do like to think through them because 1) hallucinations still slip through, and 2) it's worth thinking about the system end-to-end anyway.
Yes, yes. If you were to produce more code in the same time, you'd have to do so while holding the number of bugs constant, or before long all your time goes into maintenance. The pros aren't worth the cons. So how about we use this stuff to do a better job in the first place?
But, author, you may need to read and reflect on the self-promotion section on the About page. Just about all your submitted posts are your own.
I like the approach defined by the author in this article, but it feels for me in stark contrast with something the same wrote in February.
So as a senior, you could abstain. But then your junior colleagues will eventually code circles around you, because they’re wearing bazooka-powered jetpacks and you’re still riding around on a fixie bike. Eventually your boss will start asking why you’re getting paid twice your zoomer colleagues’ salary to produce a tenth of the code.
It's possible they were going through some kind of stress-induced episode where they ended up over-using and over-valuing LLM code output.
Working on a large (>1Mloc) Python codebase I've found that the most useful application for me is in code reading and summarization and locating when certain things have happened, not in producing code. I generally prefer to not allow LLMs to produce code at all and to only use them as a semantic search tool.
I wonder what the ratio of input to output token is for different tasks and whether or not the ratio of input token to output token tells us anything meaningful about how people are using these tools.
If Mythos taught us anything, it’s that LLM agents are really good at finding bugs.
Not really if some of the recent PRs I received are anything to go by. Both were marked CRITICAL. Both attempt to fix the same (non) issue in my code. The first didn't even fix all the problem spots it mentioned (and even there, it didn't "fix" anything as the code wasn't broken to begin with), and the second one effectively did no-ops to the code, and rewrote the code in Python to prove the C code it "corrected" was correct. And it's not me holding an outdated LLM wrong, I did not run the LLM, nor did I explicitly ask for it to be run [1]. Nor can I find out any information about what LLM was doing the "fixing" here. Anyone willing to steelman this thing?
[1] Maybe implicitly due to it being on Github, but I didn't not directly ask.
Having used LLMs to find serious bugs, I can tell you this is not it. The reporter asked an LLM to find a security vulnerability and it was overly eager (as always), to call code smell a vulnerability.
If you ask a sufficiently capable LLM to identify security issues and check its work (eg with address sanitizer), you get really sophisticated bug reports.
Making the shift to treating the various models/agents/&c as rubber ducks that can talk back has been revelatory for me. Using them to do a bunch of up-front work to plan out a change and then follow up with reviews like this has been amazing. It's still fun to try and one-shot various ideas I have, but honestly I've realized I still really like writing code. I enjoy making things, but writing code is just nice.
My 2c for JS/TS world. Many of my colleagues still don't like to deal with TypeScript because of type wrestling. With LLMs they can now write pure JS and then in two passes refactor it to be TS friendly and then add strong types. I'm guessing the same approach can be applied to testing.