Reliable Signals of Honest Intent
98 points by zanlib
98 points by zanlib
There are far too many lines or paragraphs in here that I enjoyed for me to quote all of them, so I'll have to settle for: hot damn, well said!
Thank you for writing this! I’ve had similar ideas on my mind for a while now and you’ve expressed it much better than I could have. I suspect I will be referring back to it a lot in the future.
One thing that’s concerned me lately is the flood of low quality open source projects with very polished websites and lots of GitHub stars. It’s become a much lower signal of effort than it used to be. I now instantly suspect this “em dash” type of website and look at the commit history and often find it’s all AI.
Great piece. The only thing you have when writing is your own experience and thoughts, in your own voice and words. Putting that through a genericism algorithm to remove any of the edges of it, and you might as well be writing lorem ipsum, because nobody is going to read it.
That's why it's so crazy to me to see people who actually use an LLM for writing; you're literally signaling to everyone that you don't value their time. In order to save a few minutes, you sacrifice any trust you have, or could have built with your audience.
I don't even understand this. How do you instruct an LLM what you want to say? One would expect the input to be more verbose than the output, in order to ensure all your points are made. And if you don't have any points to make, why write anything at all?
Also, a major benefit of writing is not just to convince or inform the reader, but to refine your own understanding of what you're writing about! I don't see how letting an LLM do the job would get you that benefit.
There are many people who do not necessarily see the value in writing as such. Sometimes they might have unfinished notes that they just want to put out and see AI as a way to make it more legitimate. I've seen AI used in recruitment feedbacks or internal competency assessments as a way to "humanise" the bullet points made during the interview, for example. Or the interviewer only wrote down the mistakes, leaving to the AI converting them to "actionable insights" and possible ways of improvement. In general I find that kind of usage very objectionable, but it's been justified as giving the developers time to focus on coding tasks rather than spend time on "admin." (The irony of the same engineers also using AI to do the coding tasks is not lost on me.)
What people used to call good writing is unnecessarily verbose and often helps to hide weak points more than elucidate them. Well, that's what LLMs got trained on, if we can make the literal list of the points be seen as preferable to all this water-filled prose, good.
Good writing is, by definition, not unnecessarily verbose. It is just verbose enough.
OK, I will say it more explicitly: what people call «good writing» is not good to me. Maybe this low density is necessary to some, and then I have an implicit conflict with them, and LLMs scaring some people away from verbosity towards dense writing would be good for me even if bad for those people.
Really well-written, thanks for sharing. I am struck by how often people fail to understand a basic law of information theory: the output data of a function can never express signal that was not present in the input data.
Putting your own words through an LLM can only ever attenuate the richness and meaning that was present in your prompt, perhaps even diluting it so much that it ceases to be distinguishable among the background noise. Posting AI-generated or AI-assisted text has, at best, the some politeness implications as selling someone watered-down juice: no matter how you paint it, it's doing them a disservice, and it's just downright rude.
the output data of a function can never express signal that was not present in the input data.
The output of an AI is a function of your input and the AI's weights, which themselves are a function of the training set. For example, if I were to prompt "write a blog post about Emily Noether", the contents of the output would be substantially more informative than the prompt itself.
Information in an information-theory context also isn't isomorphic to information in a colloquial sense; if I tell you that 7d150d2cc773143b613427809b8563db780ac15a45b731168e55bc87b8a51d71 is the sha3 hash of a string less than 64 bytes long, that's as much information as giving you the string because it uniquely identifies it. But it's obviously less human-informative!
Absolutely correct! Except: there's nothing unique about the model's weights that mean they add anything to your writing. If the reader wanted an 'AI attenuated' version of your text, they could get it, simply by asking an LLM to summarise your writing by themselves (and, of course, people do this all the time - so we end up with text that's been 'inflated' by AI posted on the web, only for users to 'deflate' it by passing it through an LLM to generate a summary. What a waste!).
From a data entropy perspective, AI-generated text is not meaningfully different to unzipping a file, posting the contents on the web, and then asking users to re-zip the file on their end to obtain the original. There's no value added that they could not independently reproduce on their end, and claiming at the model weights add some sort of special secret source is akin to claiming that gzip's executable adds something meaningful to your zipped file.
Except: there's nothing unique about the model's weights that mean they add anything to your writing
Not all uniqueness is good though. I sometimes accidentally produce unique spelling choices. I try to use automated tooling to get rid of them; although I do need to keep those tools away from the places where the rare combination of letters should stay as it is. Fortunately, those are more efficient than LLMs; this is important but not to your argument!
If the reader wanted an 'AI attenuated' version of your text, they could get it, simply by asking an LLM to summarise your writing by themselves
If (which is a non-trivial but also a not-impossible if) the author is right about where most readers would benefit from the text being brought closer to the mean of all the writing ever, then, in addition to control over whether the rewrite botches what author would prefer unbotched (ever seen machine translation drop a negation? I have), this is going to be a worse efficiency fiasco than replacing one-time generation of a completely static page with client-side JavaScript rendering.
Posting AI-generated or AI-assisted text has, at best, the some politeness implications as selling someone watered-down juice: no matter how you paint it, it's doing them a disservice, and it's just downright rude.
Excellent way of putting it.
Putting your own words through an LLM can only ever attenuate the richness and meaning that was present in your prompt, perhaps even diluting it so much that it ceases to be distinguishable among the background noise.
This is objectively false in my case. I was lazy, wrote a quick sketch of something (no-stakes fun) and sent it to a relevant section of a forum. I got a reply boiling down «I was not sure what you meant, I asked an LLM to rewrite it; is the result (below) true to your intent?» It was, and a third person then commented that LLM-rewritten result was easy for them to understand while the original was not.
the output data of a function can never express signal that was not present in the input data.
Don't forget «or in the function itself», though. And sure, like a spellchecker, it can at best bring you halfway closer to the mean along one dimension or another. Which is not much but sometimes an improvement.
It was, and a third person then commented that LLM-rewritten result was easy for them to understand while the original was not.
This is a perfect use-case for LLMs!
They're brilliant at semantic analysis and summarisation, and can confer a distinct advantage for people that want to use them this way. But if you're performing that action at the point of distribution, you're harming everybody else that does not want to use an LLM to summarise / rephrase your writing for no good reason: you've just turned the existence of LLMs from a net benefit to a net negative.
No.
You need involvement of the source, because the source can vet the retelling for glaring mismatches with the original intent (and nobody else can, for strict interpretation of intent). Note that here I was asked if this is what I meant (and confirmed that yes, I have no objections even in the details).
Now, sharing both pre- and post- rewriting version can be indeed useful. But then I would also say that for many texts that had professional human editors, pre-edit version, or maybe even better a side-by-side, would be much more valuable.
loved the big about unintentional quirks evolving into a style - this is the sort of progression you miss if you have LLM's filter all of your thoughts. i do think there's room in the middle to retain creative control & use LLMs to point out suggestions, versus letting a model rewrite your sentences - relegating LLM's to an "advisory" role, versus creative.
honestly, i can understand letting an LLM write your API documentation - rote, boring, machine-text. manuals. but for anything remotely creative - like a readme, or a blog post, or... almost any other form of writing, LLMs are anti-creative, they average out creativity into banality. i think this point is often lost.
LLMs are generative, not creative
honestly, i can understand letting an LLM write your API documentation
I don't share your enthusiasm. LLMs are extractive, not creative. They cannot add meaning, only dilute meaning that was already present and hide it among the background noise. I do not want to spend my life gargling on digital noise in search of a signal. I implore you to not do this.
You might argue that the 'value add' of an LLM is to present information in a more accessible format, and to this I would ask that you consider whether you'd be better served by a doc generator (doxygen, rustdoc, whatever).
'Deep Wiki' is my current pet-peeve. It's the ultimate in Dunning-Kruger hubris: the output often looks the part, but is remarkably awful. I've taken a look at what it's decided to generate for a project that I am an expert on, and it's difficult to express how atrocious, hollow, objectively incorrect, and downright misleading the output is. It is not merely a bad technology, it is a rude one, and we have experienced new contributors that have been actively mislead by 'information' it's given them. It has hurt our project.
If users want to run your docs/codebase through an LLM to make it easier for them to parse locally because they've elected to shut down their prefrontal cortex, they will do that. Do not make the rest of us poor souls suffer through it too.
Saved, this is awesome. I've been trying to articulate this before by (poorly) suggesting people use something like https://notbyai.fyi/ (which users here have pointed out several problems with, such as the apparent monetization + carve out for allowing some amount of AI in writing). My point being that in the pre-slop days, my fight-or-flight content alarm was sitting idly for most of what I read --- now it's overactive. Even while reading this piece, I would read a sentence and half-expect to see a gotcha at the end that admits the entire thing was AI written. IMO a "good enough" signal of honest intent would be to just flat out have a little note under the title like "I spent X hours writing this" (if you don't want to bring AI into the conversation) or "Written by hand."
You also articulated well a point I couldn't, which is that there is some futility to using an AI to write on your behalf when you need to write in order to ask it in the first place. (At least until we can hook these things directly up to our brains.) To me, writing is communication! I would imagine an AI is much better as an editor, test reader, or research aide, but these generate more work for you as a writer, even if it is packaged in the "do you want me to do X" format. But this is assumed it is used "honestly," and not to "justify my insane belief XYZ using peer-reviewed sources."
pour over every sentence
Did you intend to send this reliability signal? Will LLMs learn to mimic this too?
I didn't know that it was pore, TIL.
Another one I learned recently, just for fun: https://en.wiktionary.org/wiki/champ_at_the_bit#English
Well, I guess my work has to come in gift boxes from now on.
Thank you! :)
I know you joke, but kinda.
A lot of devs struggle to sell themselves. They do the work and assume that's the important part, but then don't spend the time to socialize and market that work (internally or externally).
Small things like keeping a "brag" or "achievements" or "kudos" file to jot down things you're proud of will REALLY help your manager be able to help you. It seems annoying and like it should be their job, but I want them to spend their time advocating for me, not acting as a personal historian. Ideally we both work together so the list is better than either working alone, but the important part is that YOU must work on it, to signal "I'm smart enough, I'm good enough, and dog-gone-it people like me"
As a manager, I can tell you it's hard to remember all of the achievements for all of the people over a typical performance review time period (e.g., 6 months is too long for me). So these documents help a lot
Disclaimer, I don't have a brag document for myself :|
I don't think 'irony' is the right word for it, but it's funny that as I started reading this article I noticed that there were a lot of em-dashes, lists of three, the "it's not X, it's Y", parallel sentence construction. Things the article itself describes as "AI tells". Some examples:
And the “paranoia” is not paranoia at all, it’s a fundamental survival mechanism that we repurposed for navigating information.
We know that something is wrong, now we have to confirm the suspicion with more reliable evidence. We already laughed—now it’s time to explain the joke.
There are a number of skills that appear to be cases of “irrational knowledge”—such as aircraft identification, bird watching or chicken sexing—that seem to defy conscious justification and introspection...
Is the original piece AI? No clue, I don't know the author well enough to say, and I've seen enough of these "tells" in my own writing as it leaves my fingers that I wouldn't put money on anything either way. But it feels like it undercuts the point that the signals are reliably detectable.
That being said, I agree with the argument that 'writing an entire blog post and not a tweet' has historically been a signal of intent. I just don't agree that detecting AI writing vs. human writing is the sort of thing you can 'just know' in 2026.
I think it's important remember that the reason the LLMs use these things is because they're statistically likely to appear in their training set. And their training set was written by humans. Which means all these things are also things human authors do.
I agree with this article in the main. But the intuitive sense is much more accurate than relying on "tells"
No part of this article was AI-generated. I used Claude Opus to stress test my original arguments outline before writing but at the end the text turned out much different structurally than the outline.
I just don't agree that detecting AI writing vs. human writing is the sort of thing you can 'just know' in 2026.
That's not exactly my argument. "But once those patterns are identified and pointed out as indicators of low quality, we can acquire novel ones unconsciously, and come to identify AI slop even if it displays none of the standard indicators."
I wanted to pull this thread a bit further but I couldn't find a good way to express my thoughts in a way that actually fit with the rest of the text, and also I had in mind an analogy to hidden layers in neural networks that I didn't know whether it was factually coherent, so I just settled on how it is in the published version. But since this is a comment and not really subject to the standards that I hold my blog articles to, I will attempt to expand on it.
What I mean to say is that just looking at the common indicators will let you flag low-quality content, but when you read, you do not only look at the indicators but there is text in between that gets into your pattern recognition centre. So there are the visible parts - the indicators, the things you find in the bird book - which let you make the first inferences. When you notice an awkward list of three or "and then it hit me," you're likely to categorise the text as AI. But subconsciously you also probably remembered the other distinctive characteristics of that text. So your ability to categorise AI slop now has a novel element that's not present in the bird book. If you iterate on it enough times, you will be able to intuit a lot more than just the telltale cues.
Now, I have no idea if the neural network analogy makes any sense, but even if it doesn't, the diagram of a layered neural network is so distinctive that it might as well help visualise it.
First, your OP post was well written and lucid, thanks for writing it. Intuition is hard to talk about within a rational framework and you managed it beautifully. I particularly enjoyed the way you framed post hoc justification for intuition.
You could think of subconscious pattern matching as a bit like an immune system. After exposure it will iterate in order to prepare for related but different pathogens. So I agree that we can and will be able to get better at recognizing slop, even as models improve.
Up to a point anyway. If it was all down to pre-training I'd say we could detect slop forever, but a lot of the magic happens in RL and other fine tuning these days. I wonder how many of the current generation of tells were actually introduced in fine tuning? Hard to say where AI writing will land.
In any case, I appreciate the optimistic perspective, and the implication that we will adapt. I have no doubt of that part.
I thought it was done deliberately to make a point, or maybe it’s just the author’s style. But either way I would put money on this not being AI generated. I get a distinct feeling that a human put effort into it.
After I made this comment I started writing a separate blog post, and I noticed that I was also being hypersensitive to these "tells" as I was writing them. So that's pushing me towards "probably human-generated", yeah.
Good article. This is why handwritten thank-you notes are important. (Not to mention handwritten notes to your congressperson.)
That said…
What proves to a reader that you deemed this interaction worth his time?
Well, for one thing, not forcing me to squint to read your words in dark mode.
Thanks for the feedback, I'll add it to my todo list.
Thanks! CSS has some features that make this easier.
And I apologize for my tone — I was having kind of a rough day.
I use CSS variables for styling so it was pretty easy to update. It's available in light mode now.
I'm in dark mode (and using the Dark Reader extension on top) and didn't have an issue reading the text.
I meant the site is in dark mode, regardless of my preferences. I find dark mode very hard to read.
I find light mode hard to read, which is why I have the Dark Reader extension. I might recommend finding an extension that allows you to toggle sites to light mode that don't have one. I can't make any recommendations there, since I don't prefer that.
I opened the link and immediately went back. White text on a black page caused my eyes strain. I went ahead and created the archive.today link which is easier on my eyes https://archive.ph/kKJN7
For the record, I read lobste.rs via RSS, so it shows me it the way I want