Your LLM Doesn't Write Correct Code. It Writes Plausible Code
49 points by kitschysynq
49 points by kitschysynq
I think the problem with code, any code no matter who writes it human or program, is that the solution space is very large, and there are exponentially more bad convoluted solutions than good simple ones.
"If I had more time, I'd write a shorter letter" and all that.
AI models, at least right now and I think for the foreseeable future don't really have the ability to the grasp the bigger picture of anything they're working on, unless it's some trivial green field widget. This makes them extremely prone to append-only coding.
You can coax them into producing good code by pointing out all the mistakes they are making and all the inefficiencies they are introducing, but this is a fairly tedious process that often takes orders of magnitude longer time than the actual implementation.
Whenever we've improved programmer tooling, with higher level languages or IDEs or whatnot, we've only seen projects get larger and more complex. This trend seems to continue unabated into the Claude age.
My rule of thumb is that hammering code into shape takes around 5x the amount of time as doing the initial implementation with Claude Code. Which is closer to an order of magnitude than not, but also not prohibitive, and still produces notably better results than doing it all by hand.
The other advantage of an LLM is I can make sweeping changes more easily. Previously, if I got something wrong in the domain model (things like a field being in the wrong spot in a large graph of product and sum types), I would either have to spend a tedious couple of hours fixing up all the code sites, or accept the tech debt and move on. Now I can ask Opus to update the types, then follow all the compiler errors until completion. It's nice being able to resolve long-standing tech debt by uttering a few words.
I have fond memories of the engineering manager meetings we'd have at previous companies, where at minimum twice a year we'd attempt to rationalize with execs the need for a "code freeze" so we could spend an entire quarter cleaning up technical debt.
I am of the growing belief that although LLMs can easily create much technical debt, they can also wipe so much out in an instant. Much of our technical debt in these prior scenarios were things like "Upgrade to React vx.xx", "Migrate off legacy API", etc. Functions which were indeed tedious but also well documented path ways (CHANGELOGs etc). We sometimes solved these by writing code generators to get through the process, although looking back, writing the code migration tool was a fun exercise but likely not any faster than grinding through the changes.
LLMs solve this slice of the technical debt pie, which is nice. But my worry is that the debt will accrue at a more significant rate, at least for some companies.
Absolutely, yeah. It is too easy to use LLMs to produce terrible results.
To draw on an analogy, and acknowledging that analogies are never perfect, we're in the C era of LLM use today -- great power but also many sharp edges. There will probably be the Rust of LLM tooling at some point.
We're in the C era of LLMs in terms of dealing with unending minefields of unforced errors.
We're in the Rust era of LLMs in terms of delusional, religious belief in them as a solution to all problems and the imperative to proselytize.
We're in the Smalltalk era of LLMs in terms of making a ton of money for manifesto-wielding consultants who never seem to ship much of anything but are enthralled at the new tools they're using.
We're in the INTERCAL era of LLMs in terms of bargaining, threatening, and pleading with tools to work correctly.
We're in the Malbolge era of LLMs in terms of my sincere desire to never use them for any reason if I can possibly avoid doing so.
The original project makes no claims about performance that I can see, just how it scales and correctness. None of the frankensqlite things in the post are what I'd classify as bugs in that context. The absolute performance doesn't even seem to be an explicit goal at this point.
There may be lots of things to criticise there, but "this in-progress reimplemention aiming for extended scope doesn't match the performance of the original" doesn't really tell us anything interesting. If some time is spent on performance optimisations and it's still thousands of times slower for trivial things, then that's different.