The Eternal Sloptember
64 points by rcalixte
64 points by rcalixte
These parts match my experience:
The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.
[...]
I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast.
The current state-of-the-art LLMs can very quickly produce a low-quality prototype, but using them to produce production-quality code is somewhere between uneconomical and futile. Optimization in particular is something I'm trying and failing to get claude to do right now[0]. Human-written code is developed in blocks of well-structured logic, whereas LLM code is a huge blob that sort of ... congeals ... in the general direction of quality as it gets poked at.
But I disagree with this conclusion that this property makes LLMs not viable for programming. A lot of code doesn't necessarily need to meet the quality bar that humans hold themselves to. This is most visible in corporate environments, where the engineers want to write good code and management just wants something that meets spec. And anyone who's contributed to major open-source projects can think of places where the code is clearly bad but not bad enough to rewrite -- which definitionally makes it "good enough".
From now on we're going to see a lot more "good enough" LLM output filling in the gaps where humans haven't cared enough to pay attention yet.
[0] If I point it at some slow code and prompt it to make performance improvements then it just sort of churns in an endless loop of profiling -> misinterpret profiler output -> change something irrelevant -> declare improvement/regression arbitrarily based on benchmark noise.
There's a common pattern in a lot of the projects I work on where I have an API that I want to exist, and even a bad implementation is good enough -- for example if I want to verify the API exposes all the error handling paths, or that the convenience wrappers can be built on top of the public API. LLMs work well there.
Or maybe the part of the project I care about is blocked by some missing functionality five layers down the stack[1] and I really don't care whether some particular sub-sub-sub-module is elegant pure functions or a spaghetti of state mutations.
Basically any time I can point an LLM at a file full of // TODO and let it hallucinate until the tests pass I've appreciated having that capability. If that file turns out to be important I can always go back and replace it with hand-crafted code. As a side-benefit the LLM can produce tests that only touch the public API, which are then portable to the human-written implementation.
[1] Rust, for some goddamn reason, has no production-ready DEFLATE implementation. There is deflate (deprecated, requires std), flate2 (just a wrapper, requires std), miniz_oxide (pre-v1.0), zlib-rs (pre-v1.0). So I put together a stub deflate/ module and prompted Claude to fill in the details. The code isn't great but it's good enough to avoid depending on pre-v1.0 dependencies, or an FFI dependency on zlib.
Rust, for some goddamn reason, has no production-ready DEFLATE implementation. There is deflate (deprecated, requires std), flate2 (just a wrapper, requires std), miniz_oxide (pre-v1.0), zlib-rs (pre-v1.0). So I put together a stub deflate/ module and prompted Claude to fill in the details. The code isn't great but it's good enough to avoid depending on pre-v1.0 dependencies, or an FFI dependency on zlib.
Preferring LLM output over 0.x Rust crate is taking 1.0 as a magic number to a new extreme. zlib-rs is production-ready enough to use in Firefox if it wasn’t for broken Raptor Lake CPUs.
I am 100% sure the code in zlib-rs is better than LLM output, and 99% sure it's better than what I could hand-write if you locked me in a room with a copy of the RFC and a library full of compression theory textbooks for a year.
The reason I don't want to depend on it is the dependency upgrade treadmill. If I depend on a v0.x library then in a month it might be v0.(x+1) and need code changes to update, another few months it'll be v0.(x+2) and need more code changes. That's a quick path to burnout.
If either zlib-rs or miniz_oxide released v1.0 (thereby promising a stable API) then I would be thrilled to rm -rf the LLM stuff.
Why not just vendor a specific version of zlib-rs or stick to an old version? That sounds more sustainable than LLMing a replacement.
That was the backup plan in cause the LLM produced total garbage, but the LLM version is shorter (~1500 lines total for the decoder), has a lot less unsafe, and is faster (a representative criterion run reports zlib 1.4075 GiB/s, zlib-rs 1012.4 MiB/s, llm slop 1.2562 GiB/s).
Also, I just ... don't care. DEFLATE is >30 years old, it's been replaced by LZMA and Zstd everywhere. I don't want to spend time thinking about a format from the DOS era.
Someone will eventually publish a Rust implementation of DEFLATE with a stable API that I can swap in. Both Google and Cloudflare are big pro-Rust shops and have their own zlib forks, maybe they'll do it. I'll check on that in a year or two.
I'm not so sure about "promising a stable API" -- the promise is only to not change the API without moving to 2.0. There is no promise that 2.0 isn't coming tomorrow
And in before, “you are using it wrong.” I have tried all the different models, different harnesses, different prompts. It’s not this. The people who say this would probably say the same thing about slot machines, you see, you have to bet 5 lines after you get a cherry no wonder you aren’t winning!
People do say this about slot machines. I don't gamble and won't spend time gambling in a casino, and partly that's because I know I'm susceptible to someone reward-hacking my brain.
But I've had people who are enthusiasts say both:
So yes, the above doesn't mean that all LLM peddlers are slot machine peddlers, but it does mean that even in the most basic stripped-down scenarios of literal slot machines, you'll see people making these kinds of arguments.
That said I can't speak highly of this blog. I mean, look at what they wrote previously. I guess maybe it's an indicator that some of the very pro-AI folks are becoming more jaded, but also I am suspicious of anyone whose blog is full of references to "plebs" and "high performers" vs "low performers".
Did you read the article you linked? Hardly seems pro-AI.
Ah you're right. Lazy reading on my part. I hit the "pleb" thing and took it sincerely and bounced off immediately. I retract my previous comment.