Which Design Doc Did a Human Write?
31 points by mtlynch
31 points by mtlynch
To me, "Alternatives Considered" was the obvious giveaway section. The AI can't stay on target.
However, this is an exemplar of what everybody is screaming about AI. Things are fine ... fine ... fine .. WTF? ... okay ... okay ... WTF?
You have to waste time to figure out that the AI is wasting your time.
I guessed correct. I just scanned the outline. The AI outlines had what felt like a "sampling of possible sections", some seemed more thrown in than other. The correct answer had sections that I expected.
That said, if the question had been, which if any, of these was AI, I'm not sure I'd done as well. Knowing that one had to be human, it was a matter of comparison. Detection in isolation is usually the issue though.
The dead giveaway from the first divergent section (User roles) was:
My wife and I will each have Owner permissions. Everyone else we invite will have Subscriber permissions.
An AI could have written that if you put it as the persona that it was writing from as being the author, but I figured that was fairly unlikely.
The other giveaway was that the AI specs were writing for the reader with justifications that explain how the choices work. See in particular the Licensing sections for a good example of that.
I guessed the Opus vs 5.4 correctly, being fairly familiar with Codex's style - that could be vibes or could be just over indexing on a 50/50 chance.
I also reacted to that, but felt it was sort of unfortunate that’s how the example went.
While that’s a real tell, it doesn’t affect the quality of the design doc, and probably wouldn’t appear in a design doc in a work context.
Still a nice exercise.
I guessed correct. The flow of the writing between personal descriptions of user needs ("I," "my wife," etc.) to "they" in the "Assumptions" section was one giveaway. Codex produced very stilted text—who among us doesn't refer to our kids as "child subjects" in our family app docs, am I right, fellas? Meanwhile, Claude produced text that was less straightforwardly bullet-point-hell, but stuffed a bunch of ad-copy buzzwords ("mobile-first responsive design," like we're advertising a React UI framework or something). Turns out the guy behind Refactoring English has a real sense of how to convey a bunch of information with clear intent to a specific audience!
I agree with ~travisgriggs that the framing of "spot the genuine article" made the exercise a lot easier. I think if you had multiple humans of varying competency levels attempting to write the same set of docs, then threw Claude in there, and asked me to identify the LLM, that might have been pretty tricky. I'd be interested in seeing an experiment along those lines.
The giveaway is the Borat bigram, "my wife".
I guessed it, but basically just because I didn't think the LLM was sociopathic enough to use "I" as much as a human.
"I like their simplicity" and "I’ve been using them for the last year, and I like them" was enough for me.
I also got it right, the giveaway to me was the architecture sections (admittedly I skimmed the docs and only focussed on the architecture sections because I expected the AI versions to be bad at that).
The human version has significantly more useful information than both of the AI ones put together. The parts the Claude version has are ok (though still reads as weirdly inhuman) and obviously it's missing a lot, and the GPT version is basically just a vague outline.
If this were at work and I got something akin to the Claude version from a junior dev (in some parallel universe where LLMs were never invented) I'd call it a good start and give a bunch of feedback for them to incorporate into a new version. If I got the GPT version I'd reflect that maybe they weren't as ready for this project as I'd thought, and pair with them on writing a new doc.
I'll be the first to say I got this wrong! I was convinced it was A after reviewing the other 2 - I saw the emojis and stuff I wouldn't expect a person to add into the document, whereas in A, it even mentioned a weird particularity about a dependency, which I would expect a person to note. I did not spend much time viewing them all, maybe 1 minute total, and more or less did the same steps everyone else did here, with the table of contents being my first "cause of suspicion"!
I suppose in a real situation you'd read all 3 thorough and probably bias toward the one that fits your mental models best & which have the most & relevant problem details.