Your code is worthless
53 points by dhruvp
53 points by dhruvp
The AI tools create code based on statistical analysis of existing code. Which is mostly crap.
So the AI tools create mostly crap. AIs are good at producing large amounts of code, but they are terrible at design. Because they've learned from people who are terrible at design.
My experience has been that juniors are either awed or confused by the AI tools. They either believe the output without thinking, or they have no idea how to understand the output.
Senior people look at the output and go "yeah, that's OK for a shitty one-off script". But for more complex changes, AI can only be used to point you in the right direction.
My experience has been that good design allows you to have a tiny amount of code with a large amount of functionality. That requires human understanding (so far).
When the AI tools produce volumes of slop, it doesn't matter how much the slop does. No one understands it. No one will maintain it. Only AI can fix the bugs in it. So my prediction is that we will soon be overwhelmed with "cool software" that's just AI slop.
If every enterprise can ask AI "Please create me a full-featured word processor", and it does, then what happens after that? Sure, you're "productive" in that you've created all kinds of cool company-specific software. But then what? Who maintains it? Who supports it?
I'm not saying that I know where this is going. But if someone like Garry Tan has been infected by the cult of AI slop, I think we're going somewhere bad.
Sure, you're "productive" in that you've created all kinds of cool company-specific software.
I've been thinking about why people are adopting vibecoding, and my theory so far is that it's productivity fetishism. It makes sense, the whole industry was already obsessed with productivity and metrics. The developers largely aren't producing value for the users, but padding metrics for the management. Management in turn also ignores value for the users, and only looks at metrics. Surely, an average user anxiously checking our app 42 times a day means we're going in the right direction, right? Best set a task to add more notification triggers for the next sprint! To the moon!
The AI tools create code based on statistical analysis of existing code. Which is mostly crap.
This assumes that you cannot distinguish good/better code, which I think is debatable. Plus the training sets and the feedback loops in play now are more sophisticated than I think you are crediting.
To push on that further -- For many of the pro-AI-coding people I interact with, this is one of the key pieces of evidence. Over the last 12-18 months the improvement and rate of improvement is a key driver.
I also find your argument a bit of a non-sequitur - You argue that a tiny amount of code produces a lot of functionality. That's always been true, and yet most existing code is still crap?
You argue that a tiny amount of code produces a lot of functionality.
No, I'm arguing that for a given set of functionality, well-designed code is smaller than poorly designed code.
That's always been true, and yet most existing code is still crap?
There is no contradiction here.
If I want to argue that most existing code is pretty bad, all I really have to do is point vaguely outside.
This is a great writeup. I've felt all these feelings. I just didn't have the words, because this has worn me down so much.
I especially appreciated the tiny grammatical mistakes, because it showed it wasn't written by AI, haha. Normally those drive me up a wall, but there was something poetic about it there.
Lines of code are a measure... but a negative one. It is the cost of the future maintenance and risk of bugs.
As I wrote in one of my previous articles on software complexity:
These days, it probably would not occur to anyone to build a house with walls two meters thick. Similarly, no one would join two pieces of iron with twenty screws when four would suffice. Nor would anyone add gold to paint for no reason or apply twenty different coats of paint when it does not have a significant positive impact on the final properties of the product. In material-based fields, such as construction or mechanical engineering, there is a natural pressure to simplify products and processes, because every brick or screw costs money, just as every hole drilled or coat of paint applied represents an immediate cost.
In contrast, in non-material fields, such as software engineering, this natural regulation is absent. Code can be copied for free, complexity can be increased and dependencies or links added without incurring immediate costs that anyone would notice and stop. These higher costs only become apparent during subsequent operation and maintenance, by which time it is too late to simplify the system, and doing so retroactively is not easy.
The law works in a similar way – legal codes, tax regulations, or contract terms can easily be expanded and made more complex – printed paper costs almost nothing, and files on a disk are even cheaper. Lawyers thus flood our environment with more and more texts and increase the complexity of the system, which imposes costs on society as a whole (and provides lawyers with work). And just as there can be (unintentional or intentional) security flaws in software, there are often loopholes in laws and contracts that can be exploited. The more complex the system is, the smaller the chance of detecting such flaws.
I wonder why they didn't link to the original LOC story: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Interesting that the AI cult is fetishizing code bulk. I think one strong sign of an experienced engineer is that they know code bulk is a burden, not a benefit.
The most obvious way to describe valuable software is the value a user receives from using it. This sounds like a platitude until you realize that all software without users is, by definition, worthless. The source code itself holds zero intrinsic value.
If you define it like that, yes. But there are those of us who derive joy from writing code and see beauty and elegance in its structure. I have written code that has never seen any production use and it was a tremendous joy. I get bored by 'means to an end' coding, even though (or, perhaps, precisely because) it is the most economically efficient in the short term.
The fact that virtually every benchmark for every model gets "evaluated" on code generation doesn't help. I wonder how different things would be if the industry had focused on deliverying better code instead of more code. Every AI company focus on benchmarking/showing off code generation because those benchmarks look better and the "one shot" solutions give good publicity (also from users).
I don't see many engineers seriously anchoring on LOC anymore. It's not useless, it just requires a lot of nuance and interpretation -- which is true even in the conclusion that argues 10 LOC is better than 37k. Same argument, different direction.
The alternative metrics in the post are real, but they're lagging and hard to attribute. In a fast-moving business they're impossible to disentangle. Is user growth from the software, the sales team, or a viral post? Will that be true next time, maybe not? That’s art, not science.
The path from the production of code to business outcome runs through too many layers. A critical security patch might barely register in any of these metrics, and yet it's obviously high value.
That's why teams lean on noisier but faster signals like PRs or tickets. They're leading indicators. If my PR rate doubles, there's signal there and probably an important one. Not everything, but something. You still need judgment.
I’m not here to defend 37k LOC, but I don't think it's a good anchor or jumping off point for this debate. And social media discourse isn't a good proxy for how teams actually operate.
AI shifts the operating model more than the output. Yes, you can generate junk. You can also discard and refactor far more aggressively, and explore the solution space much faster.
We’ll get some mess out of this for sure. That doesn't make the shift trivial.
Most code - guess what - doesn't have “users”. It’s therefore fails as part of a general measure of effectiveness.
Fitness for purpose remains (like for everything else) the ideal measure for good software. At question, is how to measure that, and more importantly a programmers contribution to it. If one can’t link requirements to resulting code as is the case for both modern hand programmers and AI programmers, what hope is there beyond LOC and tokens burned?
I understand time to value to be the time it takes to get new functionality delivered to a customer, starting with their request for it to exist. Dev cycle time is the term I'd use for what the article portrays.