Let’s talk about LLMs

38 points by jparise

nickmonad

Honestly, my first thought was "let's not."

But definitely worth the read.

bitshift

I literally said "Let's not" to myself as soon as I saw the title.

I'm curious to hear what you got out of it, though! Firefox reader mode says:

36 - 46 minutes

From my perspective, everyone's already talking about LLMs (both for and against), so adding another 45 minutes of text to that pile isn't really compelling unless I know what's being said that's new.
- ubernostrum
  tl;dr
  
  Fred Brooks' No Silver Bullet was correct.
  
  No Silver Bullet applies to LLMs the way it applied to other things, and empirical evidence on LLM coding impact sure seems to agree.
  
  You'll get better returns from working on strong software development fundamentals than from forcing all your programmers to use Claude for everything, and that's a repeated message in basically all the major literature.
  
  If LLMs do turn into a revolutionary world-changing silver bullet giving everyone coding superpowers, you'll be able to just adopt them fully when that happens.
  
  The full post is me saying these things much more thoroughly and with citations.
  - akavel
    
    I want to say it's quite a good and tight summary coming from a random... oh... actual author of the article...
  - bitshift
    
    Uh oh, I like Fred Brooks; I might actually have to read this! Thanks for the (literally authoritative) summary!
    
    ubernostrum
    
    If you've read No Silver Bullet, you can probably skip the section where I explain what it says. There are also some quotes from The Mythical Man-Month in support of the idea that coding is only a small part of the software development process, but mostly that section is explaining NSB and pointing out that Brooks was right.
  - FeepingCreature
    
    This is a lot of text- and I sort of bailed out halfway through, admittedly- which manages to do absolutely everything in the analysis of the problem other than installing Claude Code and trying it out.
    
    One would think that would be the first step.
    
    (Okay, just to answer one thing: to say that because coding is say 1/6th of development the maximum speedup is only 1/6th, is like saying that because the launch is only 1/6th of the expense of a satellite, the cost of satellite deployment cannot be reduced more than 1/6th by making the launch cheap. If launches were cheap, you would not use the same workflows!)
    
    ubernostrum
    
    This is a lot of text- and I sort of bailed out halfway through, admittedly- which manages to do absolutely everything in the analysis of the problem other than installing Claude Code and trying it out.
    
    I briefly gave my own anecdote saying that I don't see much of a gain from LLM coding. I then pointed out that I didn't want to rely on anecdotes and dug into the research. Maybe this kind of snarky criticism would have worked better if you hadn't, by your own admission, "bailed out" of actually reading the post.
    
    If launches were cheap, you would not use the same workflows!
    
    There's a section quoted from the DORA report on how increased throughput does not offset the increased instability. And getting rid of the things that actually still take time, so you can just deploy huge amounts of code as fast as your LLM can generate it, is how you wind up in the bad parts of the CircleCI report.
    
    fleebee
    
    I think the author is intentionally straying away from anecdotes in favor of a more objective analysis through citing a number of studies. Furthermore, they bring up the problem of inaccuracy of self-reported accounts on the impact of LLM tools.
    
    hmpc
    
    This (very widespread!) line of argument always strikes me as the rhetorical equivalent of your cokehead friend insisting you try a bump to understand how amazing it is - complete with a referral to their dealer of choice. We can all understand why they do it, and yes we can see how energised they are and how much they're able to get done and how fast and how confident it makes them feel and gosh, the rush, like (checks notes) sipping rocket fuel, yes, that's definitely healthy for you and you will never come crashing down from the high and-record scratch
    
    Oh, the morning after isn't looking so great. Yeah, I'll stick with my regular sleep schedule and eat my veggies, thank you.
  - reidrac
    
    Very good post, although starts a bit heavy, it pays off.
    
    Particularly I liked the "On being left behind" part that articulates some points much better than I could do myself.
    
    I'm tempted to share it with the right people, although I suspect it won't have the desired effect as "people believe"; is not that much about facts anymore.
  - jonathannen
    
    Great read - thoughtful and nuanced.
    
    I think it underestimates where LLMs can create leverage. My read is that it posits fast-following as an efficient strategy. That's reasonable, but adoption isn't binary - it's a spectrum, and where you sit on it matters a lot.
    
    A couple of moments stood out:
    
    although I'm personally skeptical of the "10x programmer" concept, the software industry overall does seem to accept it as true
    
    I don't think it's universally accepted, but the variance between competent and exceptional engineers can absolutely be that large in practice.
    
    Anecdote time: much of what I've done over my career as a professional programmer is building database-backed web applications and services, and I don't see much of a gain from LLMs… But that capability predates LLMs: Rails' scaffolding, for example, could do it twenty years ago... And not just raw code generation, but also the abstractions available to work with, have progressed to the point where I basically never feel like the raw speed of production of code is holding me back... the majority of my time is spent elsewhere: talking to people who want new software.
    
    The argument that "code generation speed isn't the bottleneck" assumes most engineering time is high-context. In practice, a large portion of the day isn't. It's dependency upgrades, debugging edge-case browser issues, glue code, or chasing down weird inconsistencies. Those tasks are low-context, repetitive, and high-friction - and LLMs are unusually good at them. If that's even 30-50% of the work, the impact is meaningful.
    
    Separately, LLMs don't just increase speed - they change how you explore the solution space and cycle time. I'll often generate multiple approaches in parallel and use them to converge on something better. I've also had end-users vibe-code rough frontends of what they want. These are genuinely useful power tools, not just faster scaffolding.
    
    And though I haven't personally read through the recent alleged leak of the Claude Code source...
    
    I think the Claude Code example is a bit of a red herring. Evaluating an early (and already immensely successful) product in a new paradigm by reading its source is a strange benchmark - especially when iteration speed and product direction matter more than polish at that stage.
  - aidancully
    
    I think this makes a number of good points well, but I am not convinced of the overall argument. For one thing, I think this aims too narrowly at LLM coding, and not at the broader picture of LLMs interacting with the whole SDLC. That is, I don't think that coding is the only source of accidental complexity that an LLM can address. For example, LLMs seem very good at prototyping solutions. Wouldn't rapid creation of prototypes improve requirements elicitation from stakeholders? We should be aiming at improving the other 5/6'ths of software effort, too!
    
    Secondly, I feel like the goalposts moved really recently in the context of the article. A couple of years ago, re-writing next.js with AI would not have come nearly as close as it does today. I mean, so they missed some requirements in the unit tests, that's a real problem, and a really bad problem for us in an uncanny valley of AI capability (good enough that the errors aren't obvious, not yet more reliable than skilled engineers, much more code volume produced per unit time). But just a couple of years ago, the tools wouldn't have been able to produce these almost good enough PRs at all. I don't expect progress to stop where we are.
    
    ubernostrum
    
    Wouldn't rapid creation of prototypes improve requirements elicitation from stakeholders?
    
    I've been able to rapidly create prototypes for decades now. As mentioned, Rails had that ability twenty years ago, and though I was using Django back then, as I pointed out in this Mastodon post, it wasn't exactly slow (that post is about going from "what can you do with this" to production--not a mere prototype--in two days).
    
    I don't expect progress to stop where we are.
    
    I still don't expect a silver-bullet revolution. And even if one does happen, I don't expect to be "obliterated" because of spending my time on the surrounding process fundamentals that all the literature says are vital to actually being able to use LLMs effectively.
  - Student
    
    five-sixths (83%) of time on a “software task” would be spent on things other than coding, which puts a pretty low cap on productivity gains from speeding up just the coding
    
    Ok but to what extent does being able to just generate code replace those other activities?
    
    A lot of “what to build and how” happens because it was cheaper than writing code. Probably 50% of that can be dispensed with by generating prototypes or making parts pluggable.
    
    A lot of communication happens because development needs to be parallelized. If one engineer can write more code (measured by functionality than lines of code) then the need for some of that communication goes away. A lot of the other communication is to avoid integration hell. If rewriting code is cheap the level of agreement needed early in development is reduced. The time it would take to ping pong designs should also be reduced.
    
    What doesn’t go away is testing. What also doesn’t go away is rework. Operating and deploying the software won’t go away directly because of LLMs.
    
    My hunch is that in 10 years organizations that fully embrace the process changes will be shipping about twice as fast. My other hunch is that very few organizations will do that because the bottleneck will be getting high quality information to engineers and actually talking to customers.
  - mitsuhiko
    
    Since this leans on the DORA report quite a bit, it's important to point out that the DORA report's survey was collected from June 13 to July 21, 2025. While that technically falls over the release of Claude Code, it did not fully achieve widespread adoption yet. It will be interesting to see what future reports indicate.
    
    ubernostrum
    
    While that technically falls over the release of Claude Code, it did not fully achieve widespread adoption yet. It will be interesting to see what future reports indicate.
    
    There is a paragraph in my post which begins:
    
    The usual response to reports like these is to claim they’re based on people using older LLMs, and the models coming out now are the truly revolutionary ones, which won’t have any of those problems.
    
    I refer you to what follows.
    
    simonw
    
    if the people claiming “this time is the world-changing revolutionary leap, for sure” were wrong all the prior times they said that (as they have to have been, since if any prior time had actually been the revolutionary leap they wouldn’t need to say this time will be), why should anyone believe them this time?
    
    The problem there is that you risk bundling too wide of a group of people together.
    
    In the case or Claude Code and Opus 4.5+ (the November set of models) there are a notable number of voices who did NOT previously say "this time is world-changing" about previous model improvements, who are now saying that these coding agents are genuinely useful when they were skeptical before.
    
    My favorite example is this one by Paul Ford.
    
    jmillikin
    
    These two parts are more deeply connected then they might seem at first:
    
    the sorts of practices recommended for maximizing LLM-related gains in the DORA report, and in many other similar whitepapers and reports and studies, are or ought to be as fundamental to software development as hand-washing is to surgery.
    
    [...]
    
    When expressing skepticism about LLM coding, a common response is that not adopting it, or even just delaying slightly in adopting it, will inevitably result in being “left behind”, or even stronger effects (for example, words like “obliterated” have been used, more than once, by acquaintances of mine who really ought to know better). LLMs are the future, it’s going to happen whether you like it or not, so get with the program before it’s too late!
    
    Much has been written about the observed bimodal distribution in programmer compensation, where high-skill programmers make something like 5-20x more than low-skill programmers. If you look at what kinds of projects those different groups produce, you'll see a consistent difference: skilled programmers factor big projects into smaller independently-verifiable submodules, write tests that provide good coverage of important branches, and use tooling (static types, linters) to automate trivial verification. Low-skill programmers do few or none of these things, so they spend a lot of time writing cross-cutting changes and then struggling to manually verify whether they're correct.
    
    LLMs introduce a new capability: given a defined scope and a clear goal the computer can independently iterate toward a solution. The properties that make a project easy for humans to reason about (small scope, good tests, quick error ascription) also make LLMs work faster. In a well-maintained codebase an LLM can offload a truly surprising amount of tedious work, which makes skilled developers both more productive and happier -- more thinking, less carpal tunnel.
    
    LLMs also have a new price point, which is the problem. If you are a programmer who can write code as well as an unguided LLM then as of about six months ago your labor is worth less than $0.50/hr. There are a lot of professional programmers out there who do not write code as well as Claude, and it is correct to describe their mid-term career prospects as "obliterated". Their only hope of staying employed as a programmer is to learn how to effectively use LLMs, and they need to be doing it today.
    
    This quote from Brooks:
    
    There is no single development, in either technology or management technique, which by itself promises even a single order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.
    
    has, I think, been proven wrong by history. Compilers and automatic memory management can both be credited with at least an order of magnitude in all three metrics, not necessarily immediately (early compilers were buggy, early GCs were slow) but over time. If you don't believe me then ask a junior engineer to write a web service in Python and then write it again in x86-64 assembly.
    
    It's still very early in the development of LLMs (ChatGPT was released less than four years ago!), but even now LLMs seem to be at least as effective as compilers when it comes to shifting human attention "up the stack" towards high-level design vs low-level implementation:
    
    Gemma 4 on a three-year-old GPU can fill in simple TODO stub functions while I'm at lunch. I only need to decide on the type signatures.
    
    Gemini Flash can fill in an entire file of TODO stub functions of arbitrary size, and fix any linter/compiler errors that arise, within 30 seconds. I only need to decide on the type signatures of the public API.
    
    Claude can take a brief human-written language spec plus a few examples and turn it into a correct ABNF grammar, then generate test cases guided by that grammar and LCOV.
    
    If this technology doesn't lead to improved productivity in an organization then they're either doing something extremely specialized or their development process is a mess.
    
    ubernostrum
    
    has, I think, been proven wrong by history. Compilers and automatic memory management can both be credited with at least an order of magnitude in all three metrics, not necessarily immediately (early compilers were buggy, early GCs were slow) but over time.
    
    No Silver Bullet was published in 1986. That's six years after the debut of Ada, five years after Smalltalk-80, three years after Turbo Pascal, three years after Objective-C, two years after Standard ML, a year after C++, the same year as Eiffel... I think Brooks could comfortably claim from his vantage point that year that advances in programming languages had knocked out enough low-hanging accidental difficulty to justify the "No Silver Bullet" prediction.
    
    If this technology doesn't lead to improved productivity in an organization then they're either doing something extremely specialized or their development process is a mess.
    
    The literature suggests the second option is probably true more than you suspect. But at the same time, your anecdotes about LLM coding all still focus on speed of code generation from a sufficiently well-written and well-designed specification, which once again was not the bottleneck prior to the advent of LLMs.
    
    jmillikin
    
    I think Brooks could comfortably claim from his vantage point that year that advances in programming languages had knocked out enough low-hanging accidental difficulty to justify the "No Silver Bullet" prediction.
    
    That sounds a bit like "the entire industry is currently in the process of adopting two silver bullets, but there won't be a third". Which is a risky prediction to make when the timeframe is unbounded!
    
    It's also worth noting that C wasn't standardized until 1989 -- in 1986 it was still uncommon for a single codebase to compile and run on multiple platforms, and vendor-specific language dialects were everywhere. Garbage collectors existed since the 1960s (Lisp), but they weren't widely adopted in industrial languages until Java was released in the mid-90s.
    
    In 1986 writing a markup renderer secure enough for untrusted input would have required serious engineering, by 1996 a programmer of ordinary skill could write a CGI-based online shop, and by 2006 the low-code frameworks like Rails were letting non-technical people put together complex dynamic web applications in an afternoon. Clearly at least one silver bullet has been developed at some point in those two decades!
    
    The literature suggests the second option is probably true more than you suspect.
    
    Oh no, I suspect the second category is quite large, but I also suspect it doesn't matter because the nature of competition implies LLMs will cause it to become smaller -- either by improving processes to better leverage LLMs, or through attrition.
    
    But at the same time, your anecdotes about LLM coding all still focus on speed of code generation from a sufficiently well-written and well-designed specification, which once again was not the bottleneck prior to the advent of LLMs.
    
    My experience is that the physical process of writing code (hands on keyboard) is still a significant factor in development speed at most levels of the stack, and that a lot of the non-code process exists as a workaround for the perceived expense of writing code. The only teams for which this doesn't apply are already working at the very highest levels of the abstraction stack (don't need to care about dev time if your stack is Python + Django + TypeScript).
    
    Let's say a company's engineering org works mostly in Java (backend) and Swift (frontend). A new project is estimated by engineering to take two months to a working prototype. Management will pad that to four months, then there will be three months of meetings to decide whether the project is worth attempting and if so which team's budget can fit 320 SWE-hours for the prototype.
    
    If engineering has a particularly capable person assigned to the work then maybe the prototype is completed in one month, of which two weeks were coding and the rest was mixed research/design/debugging. That puts the final ratio at roughly 1/8 coding time, 1/8 thinking, 3/4 bureaucracy.
    
    Claude might reduce the coding time and simplify the research+design time, whatever, even an incredible 50% reduction in dev time would only be 12.5% savings end-to-end. But more importantly it's a coin-operated automaton that operates without human attention, thus no haggling over SWE-hours. You just throw in a few paragraphs of spec and $100 from the department's shared services budget and now you can walk into the first meeting with a kinda-functional prototype.
    
    Instead of three months of meetings you have two weeks, so now you're looking at something like 70% time reduction to first prototype, plus the most expensive part of the process (the meetings) got reduced by 80%. Big savings for Claude!