claude code is not making your product better
36 points by carlana
36 points by carlana
This is something that also AGI-singularity-fearing people don't seem to understand: Complexity scales exponentially (probably even faster). Eventually even the smartest person/IQ/model/agent hits a steep wall of complexity as idea/system/project/codebase/feature-set grows. That's why reality at large is computationally irreducible.
Every software project goes relatively smoothly early on, until the exponential growth in complexity takes off and dwarfs everything. Good architecture, design, quality just delay the complexity takeoff moment. So if you have competent people who did a good design and took care of quality, you might be able to to hold on to maybe x10 more size/features/performance/wow but even they will eventually reach a wall.
LLM assistance allows producing a lot of features/code at certain (arguably average) quality much faster. Which just means you are going to reach the wall much faster. Which is great e.g. for growth, experiments, and taks that were relatively easy (low-complexity) but time consuming, but it is not what allows you to "build things that were not done before" and/or "large and complex projects". For that you need the the "keep complexity at bay" improvements which LLMs don't really provide right now.
It’s a classic problem. It’s easy to see the immediate benefits of something when there’s a direct cause and effect. It’s much harder to see the negative effects of something when there is time dilation between cause and effects.
http://bastiat.org/en/twisatwins.html
The cost of using code agents is not direct, it’s from the loss of collective understanding for how a system works, among many others
For that you need the the "keep complexity at bay" improvements which LLMs don't really provide right now.
I consider that "keeping complexity at bay" requires decisions, opinions and possibly opposition/confrontation (NB: I'm not a native English speaker and I'm probably not using the perfect words here).
If you ask people to implement something, they may tell you something among "it's too complex", "it's going to blow up in your face in a year", "this is incompatible with what I heard team Y is working on" or even "we don't have enough people to implement it" (which can be real or an indirect way to say "no").
Will an LLM answer that? Very unlikely, especially since they're tuned to be optimists/polites. They won't be tuned differently either because that aspect is required to get higher engagement.
The problem looks similar to LLMs assisting user suicide. LLMs have been doing that quite a lot even though it's pretty obvious. Killing a project due to unmanaged complexity is much less obvious in comparison so I don't have hopes that LLMs will do better soon.
They won't be tuned differently either because that aspect is required to get higher engagement.
At least for local Qwen ones one can system-prompt them into providing pushback (and unlike the hosted ones, one gets to control the version changes, so no «oops your prompt suddenly and silently has no effect anymore»)
The difference with LLMs assisting user to do something is that the user will jailbreak imposed guardrails but might choose to keep the pushback functionality… I have even seen people who became pure managers but still solicited pushback!
But yeah, insistently interrogating people to predict future requirement evolution (also a useful part of keeping complexity at bay!) needs a higher level of adversarial interaction than pushback-encouraging prompts.
That said, hierarchical organization does provide a path towards managing complexity at scale. We can see incredibly complex structures emerge in nature this way. A system can be built out of independent units that compose together to build larger structures.
A robust system needs to be resilient to shocks and able to adapt on its own which means that parts have to be able to fail independently with localized damage. Organizing the system into nested subsystems creates cells that talk to each other to do their job. They don’t need to know the internal processes of other cells, and act as stable subassemblies. Each level can self-organize and maintain resilience within its own domain because it’s not bogged down by what’s happening elsewhere. This is basically a way to form abstractions where you encapsulate incidental complexity behind an interface which encodes the semantics. A good way to look at hierarchies is to treat them as connective tissue between components of large systems.
Erlang OTP is a good example of this approach in software where large systems are built out of isolated processes that pass messages to one another. These processes can die and have errors without bringing the whole system down.
We can see incredibly complex structures emerge in nature this way.
We see the survivors of billions of years of evolution, where success is defined as surviving long enough to spawn.
Sure, but that doesn't change the fact that natural selection favors hierarchical arrangement. The reason these designs survive is because they work better than other approaches.
if using claude code gives you a genuine product velocity advantage, and anthropic had it exclusively for 7 months, the gap between claude code and every competitor should be unbridgeable. codex would be irrelevant. instead, people are still actively debating which one is better.
This does not look like robust thinking to me.
Claude Code is good software, but it's not some kind of weird AGI magic that instantly makes you impossible to compete with.
Perhaps not "unbridgeable", but "large".
Except that's not true. Why? From what I've seen, two reasons. One reason is outlined in a comment below:
agents make mistakes that tend to accumulate multiplicatively.
Agents are still imperfect. Doing things faster doesn't necessarily mean better. And feeding garbage back into AI agents creates exponentially more garbage in the agent.
The second reason is that agents are helpful only if you know the right instructions to use. And you can only know the right instructions if you have the right mental model.
If you're not sure what you want (architecture, design, etc.), your prompts will be "Claude, do stuff". And it will produce reams of reasonable-looking garbage. Which is full of subtle bugs.
If you know exactly what you want, you can give it detailed instructions. And then you also know how to audit the code it products. Because it will still get things wrong.
My experience has been that AI helps juniors (1) write volumes more garbage code, of (2) analyse their code for bugs and produce test cases. But they can't use it to get better, because they don't know what "better" is.
AI helps seniors increase their advantage over juniors.
In addition to that, Codex is vibecoded too, so the comparison is a little silly that way. Having a few month head start doesn't mean a lot when many of the features being added are the result of what people in the wild are doing with the tools. It's not like Claude Code started off with a complete roadmap and the primary barrier was code velocity. The primary barrier is ideas. Everyone is making it up as they go.
Beyond that I can't comment in depth on the post because I mostly skimmed. Lack of sentence case nonwithstanding, it's largely LLM written prose which I don't love reading.
It's not like… the primary barrier was code velocity, The primary barrier is ideas.
I think that is exactly the point of the post.
Is it?
and if that’s true, it implies something specific. engineering productivity is a compounding function. if using claude code gives you even a 1.5x improvement in the rate at which you can improve your product, then the team using it from day one should be racing away from everyone else. the gap should be widening every quarter. it should look like this:
This is AI slop. We shouldn't be engaging with it IMO. But since we are, do I need to point out all the obvious flaws in this reasoning?
In what magic universe does software development follow some sort of perfect progression where a head start means that exponential growth makes you impossible to catch? Who is claiming that?
Unless my math degree has completely worn off, a head start with a higher compounding growth multiple is impossible to catch. (Edit: if the assistance itself stopped improving, you’d just have a linear factor, not a compounding one, but still never catch up.)
The argument is that "higher feature velocity" is being claimed all over the place by AI hypesters, and that is by definition a higher (compounding) growth multiple, which is exactly why it gets hyped. But if that were real, you'd see exponential features shipping. Instead, you see no such thing, indicating coding faster does not, in real life, dramatically increase feature velocity. Which is because (edit: regardless of exponent of coding increase) the barrier is ideas and engineering rigor, not code velocity.
a head start with a higher compounding growth multiple is impossible to catch.
Right that's true by itself, and completely irrelevant to the conversation. It sounds good as a fantasy strawman but no serious person is claiming you can apply it to software development. The post itself did not quote anyone who is claiming that.
The rest of the the post's point, that ideas and architecture have more value now that "code is cheap", has been publicy rehashed 100 times at this point. There's no insight to be found there. Ditto for "LLMs, let loose, write verbose, difficult to maintain code". Of course they do! This isn't news, it was covered countless times before the most recent training data cutoffs. Thus the post.
Again, we're having a conversation about a piece written by an LLM.
People in software development may not be concerned about compound interest applying to software development, but people in finance and mathematics are very much concerned about the effects of compound interest. It's a very big factor in their field, and it would be reasonable to think that it may be a big factor in other fields too.
You may think it's silly, but this is a conversation that people are having, and there are plenty of hucksters trying to convince non-tech CEOs and CFOs that they need to get ahead of the curve or risk getting left behind.
They also tend to over-trust LLM output, so having an LLM-written debunking of LLM marketing might actually be a useful thing to have around…
So far agents make mistakes that tend to accumulate multiplicatively.
When they make an API that technically works, but requires a bit of extra code when using it, it results in more code, which creates more places for duplication, divergence, and bugs, which like a fractal cause more not-so-ideal code to be written.
I've had an agent design a struct with an optional id field (necessary for one of constructors it had written). It then tirelessly wrote all code twice for when the id was available and awkward fallbacks for when it was absent. The fallbacks were obviously complicated and fragile. It infected almost every use of the struct and everything depending on it. The no-id constructor wasn't even used. Half of the codebase could have been easily deleted.
I'm genuinely not sure whether it's fixable with enough of "stop digging when you're doing stupid things" instructions, and we're a couple hacks away from coding being "solved", or whether that's a long-term problem with stochastic parrots that will keep requiring exponentially increasing costs for linear improvements (as long as mistakes are compounding, compounding growth will get you).
I've had an agent design a struct with an optional id field ...
I've seen very similar things happen with human developers. :D . I'd say a mark of best devs is tendency to pause and reflect on a higher&outer level. Sometimes it even includes bussiness process - instead of spending months implementing technical solutions, a dev familiar with the domain can ask "could we just change the process instead of keep digging" and sometimes the answer is "sure, that's easy".
I suspect that LLMs will also get better at this part eventually too, as you can already ask them directly "this doesn't seam like a good approach, can you think of alternatives" and oftentime they can figure it out. It's just it's very much a hit and miss right now, and once they do something dumb like that they are rather unlikely to spot it when working on unrelated stuff. But there's a tradeoff here - people already complain about model "over-thinking" problems burning tokens. Some experience and intuition about how the code solving certain problem "should feel" help devs naturally know when it is time to reflect. Maybe agents will get that intuition with better training, who knows.
At the bottom of the page - "all content here is generated by ai". I'm honestly curious why people on lobste.rs want to interact or comment on slop?
We really, really need to add slop as an article tag.
I believe that's a joke. The writing doesn't seem AI-generated to me.
I'm not so sure it is a joke. The writing does seem AI-written to me, though human directed, outlined, probably edited. I didn't catch it immediately and I think that's because there's a particular style enforced, for example, it's all in lowercase.
Anything in particular? I'm sensitive to AI writing and nothing set off an alarm for me. I find AI writing pretty bland and sterile, and this had personality and insight.
I gave it a second read more closely and the only thing vaguely suspicious is that there's a lot of, "it's not x, it's Y" structure.
A few indicators, all of which may be unreliable:
Yeah, fair enough. To me it seems more likely than not human written, but I think you made a strong case for it being at least partially AI-written.