The Eternal Sloptember

64 points by rcalixte


jmillikin

These parts match my experience:

The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.
[...]
I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast.

The current state-of-the-art LLMs can very quickly produce a low-quality prototype, but using them to produce production-quality code is somewhere between uneconomical and futile. Optimization in particular is something I'm trying and failing to get claude to do right now[0]. Human-written code is developed in blocks of well-structured logic, whereas LLM code is a huge blob that sort of ... congeals ... in the general direction of quality as it gets poked at.

But I disagree with this conclusion that this property makes LLMs not viable for programming. A lot of code doesn't necessarily need to meet the quality bar that humans hold themselves to. This is most visible in corporate environments, where the engineers want to write good code and management just wants something that meets spec. And anyone who's contributed to major open-source projects can think of places where the code is clearly bad but not bad enough to rewrite -- which definitionally makes it "good enough".

From now on we're going to see a lot more "good enough" LLM output filling in the gaps where humans haven't cared enough to pay attention yet.

[0] If I point it at some slow code and prompt it to make performance improvements then it just sort of churns in an endless loop of profiling -> misinterpret profiler output -> change something irrelevant -> declare improvement/regression arbitrarily based on benchmark noise.


There's a common pattern in a lot of the projects I work on where I have an API that I want to exist, and even a bad implementation is good enough -- for example if I want to verify the API exposes all the error handling paths, or that the convenience wrappers can be built on top of the public API. LLMs work well there.

Or maybe the part of the project I care about is blocked by some missing functionality five layers down the stack[1] and I really don't care whether some particular sub-sub-sub-module is elegant pure functions or a spaghetti of state mutations.

Basically any time I can point an LLM at a file full of // TODO and let it hallucinate until the tests pass I've appreciated having that capability. If that file turns out to be important I can always go back and replace it with hand-crafted code. As a side-benefit the LLM can produce tests that only touch the public API, which are then portable to the human-written implementation.

[1] Rust, for some goddamn reason, has no production-ready DEFLATE implementation. There is deflate (deprecated, requires std), flate2 (just a wrapper, requires std), miniz_oxide (pre-v1.0), zlib-rs (pre-v1.0). So I put together a stub deflate/ module and prompted Claude to fill in the details. The code isn't great but it's good enough to avoid depending on pre-v1.0 dependencies, or an FFI dependency on zlib.

hsivonen

Rust, for some goddamn reason, has no production-ready DEFLATE implementation. There is deflate (deprecated, requires std), flate2 (just a wrapper, requires std), miniz_oxide (pre-v1.0), zlib-rs (pre-v1.0). So I put together a stub deflate/ module and prompted Claude to fill in the details. The code isn't great but it's good enough to avoid depending on pre-v1.0 dependencies, or an FFI dependency on zlib.

Preferring LLM output over 0.x Rust crate is taking 1.0 as a magic number to a new extreme. zlib-rs is production-ready enough to use in Firefox if it wasn’t for broken Raptor Lake CPUs.