The Future of Everything is Lies, I Guess
169 points by orib
169 points by orib
Music synthesis is quite good now; Spotify has a whole problem with “AI musicians”.
Spotify has a "solution", not a problem. A solution to their pesky problem of needing to pay human artists for content.
I love Kyle's writing style. Great post.
People keep asking LLMs to explain their own behavior. “Why did you delete that file,” you might ask Claude. Or, “ChatGPT, tell me about your programming.” This is silly. LLMs have no special metacognitive capacity.
This has always bothered me about how I've observed some people interact with these tools. The whole "As a Senior Software Engineer, do X..." doesn't make any sense to me either, and seems like entirely wishful thinking. Maybe I'm wrong, but I can't imagine where this kind of thing would show up anywhere in training data, and how it would map to actual, useful results. Beyond the technical specifics of that issue, it just doesn't make any sense. Why would I want a coding model to be anything less than an experienced, thoughtful, deliberate engineer?
There's two effects that make the roleplaying instructions useful, I think. The first is that the training data likely included documents that mention the skillsets of their authors in proximity to the work that they produce, so including the "As a Senior Software Engineer" tokens improves the relevance of those documents, and probably other documents that resemble them even if they don't explicitly mention their author's skillset, compared to other unrelated documents. So there's a narrowing of the training distribution being replicated. The second is that the RLHF process is usually built around instructions, so when including those tokens it's helpful to phrase it in the form of an instruction; the model has also been trained on lots of roleplaying so it probably has a representation of the concept somewhere in its weights.
Of course, this is a very post hoc explanation - I haven't done the science to confirm it but it's how I'd expect things to work. But it runs right into the exact problem of why reasoning is bullshit as pointed out in the post - reproducing elements of the training data doesn't tell you anything about the model's internal state at a specific point in time. That's information the model is guaranteed not to have. Anything it outputs is going to look like a plausible response to the questions you asked, and if it happens to resemble some process that actually occurred in the model it's a coincidence (one that's so unlikely I'm not willing to believe it has ever actually happened without proof).
"As a Senior Software Engineer, do X..."
Those models mimic our writing. If you prime it with biology and ask for least squares fit in Python, there is a huge chance you get something closely resembling average biologist's Python code. Same with physics. But, if you prime it with "senior software development", it will mimic regular (much cleaner) code bases.
In the end, it just outputs whatever is plausible in that context. And since people not making a living by coding itself tend to be terrible at legible source code...
Same reason those models are unable to make novel cross-domain connections by themselves. Despite having the necessary knowledge embedded already.
Mimic our writing -- ah, yes.
As a senior engineer, I always start every communication with the phrase 'As a senior engineer', just so that people know that I have written quality code.
Since this is existential for Anthropic, I'd be utterly shocked if they hadn't spend a huge amount of effort doing fine tuning to make the default match at least their evaluator's idea of quality code by default, without special superstitious phrases.
I didn't say they are good at it.
Please remember that those models do not converse. They "recall" a conversation they've "read" somewhere.
If the conversation included stuff related to code quality, code quality might have mattered in that conversation.
But yeah, I figure everyone tries to do some training on hand picked well written libre projects to get sane defaults.
LLMs have no special metacognitive capacity.
<joke> If you were dealing with a system that had unlimited cognitive capacity but no metacognition, you could get all the answers for metacognition-requiring problems by mutual recursion instead. Just ask Microslop Copilot what it is that Anslopic Claude would say about what it just chose to do.
People do that because it makes a difference. At $WORK, when I was evaling a code LLM, it was found to produce consistently better results when asked to roleplay being a specific well known and respected engineer.
Despite being silly from that perspective, it does often work pretty well, presumably because the most likely generated story of why the file was deleted is often closely related in the distribution to the generated tool call that deleted it.
Interestingly, LLMs can be given metacognitive capacity. There's no reason you couldn't give an LLM tools for introspecting its own internals. To be useful, we'd need to write the story of what those internals mean, which is a subject of ongoing research.
I vaguely remember there being research into the question of "can a model tell if you've messed with its activations" but I don't know details.
Anthropic has some mech interp work published on this. Maybe you were recalling:
https://transformer-circuits.pub/2025/introspection/index.html
Why would I want a coding model to be anything less
I think that's a big thing, there doesn't really seem to be anything specific to coding in the current models.
Not quite true, there are post-trained coding models like Cursor's Composer and OpenAI's ChatGPT-Codex.
People tend to work the same way. 'Draw the best picture of a cactus you can' and 'draw the picture of a cactus that an experienced artist would draw' produce very different results, even though they shouldn't.
Viewing that page from the UK results in "Unavailable Due to the UK Online Safety Act". One terrible development stops me from reading about another terrible development!
As I understand it, the only thing on that page that would not be allowed in the UK would be the comments section. Although I think the author is just trying to make a point and think as little as possible about this situation in the first place.
It remains unclear whether continuing to throw vast quantities of silicon and ever-bigger corpuses at the current generation of models will lead to human-equivalent capabilities. Massive increases in training costs and parameter count seem to be yielding diminishing returns. Or maybe this effect is illusory. Mysteries!
Exactly! Unless AGI then, it is better to focus on being the best human possible - both character, knowledge and skills wise. Nobody knows, so focusing on these fundamentals is the best and wisest move.
On the note of “people don’t have a proper mental model of how these things work” topic, I think basically everyone who interacts with LLMs should see this short where a YouTuber “gaslights” an LLM. Turns out that if you have a multi-step conversation with ChatGPT through its API and you change its responses to something that it wouldn’t normally say before feeding them back in, the model quickly experiences catastrophic collapse. It’s a dramatic non-human-like failure mode and I think it’s helpful for demonstrating how fundamentally different these things are.
So did the past, and so is the present?
Humans are gullible, naive, lazy, idiots and bullshiters too. The fact that LLMs can't really be trusted is a problem humans are prepared to deal with, as we've been dealing with it among ourselves for thousands of years. The whole scientific method was devised to overcome the fallibility of individual (even if otherwise smart) humans. It doesn't change the fact that we're still useful and can make progress.
I want a tool that helps avoid my weaknesses and amplifies my strengths. LLMs are tools that mirror my weaknesses and aim to devalue my strengths.
The fact that LLMs can't really be trusted is a problem humans are prepared to deal with, as we've been dealing with it among ourselves for thousands of years.
Successfully?
Regardless, I don't think our capacity for bullshit is some kind of binary. Speaking for myself, as a social creature I'm susceptible to bullshit at a high enough density. If all of my friends are telling me I should take sodium bromide, I might just poison myself. It's bold to assume that a fundamental change in the information we consume will only have a positive effect.
Though perhaps since we're on Lobsters your opinion comes from use of models for coding, which is a whole different beast in my opinion. I won't say it trivializes the problems that OP refers to, but certainly it is less tolerant of bullshit (thereby making it easier to find a "right" answer).
I occasionally pick up a frontier model like ChatGPT, Gemini, or Claude
Those aren't models. Those aren't even families of models.
I am, of course, gay as hell
Maybe hell is actually gay
The associated footnote is gold.
I am, of course, gay as hell, and no girlfriend was mentioned in the post. After a while, we compromised on me being bisexual.[5] ... [5]: The technical term for this is “erasure coding”.
Currently reading this on the train. This is really well written.
One way to understand an LLM is as an improv machine. It takes a stream of tokens, like a conversation, and says “yes, and then…” This yes-and behavior is why some people call LLMs bullshit machines. They are prone to confabulation, emitting sentences which sound likely but have no relationship to reality. They treat sarcasm and fantasy credulously, misunderstand context clues, and tell people to put glue on pizza.
This reminds of my favorite personal example of Deepseek taking 100 citations I needed formatted, deleted 10, made up 10 awesome sounding ones, then gaslit me into believing I found some insanely promising primary source that I didn’t. I didn’t expect it to make up stuff when it came to a simple but tedious formatting job. If it can’t do that I won’t trust it for anything.
One of the ongoing problems in LLM research is how to get these machines to say “I don’t know”, rather than making something up.
Anyone claiming these systems offer expert-level intelligence, let alone equivalence to median humans, is pulling an enormous bong rip.
Altman claiming ChatGPT 5 is smart than an expert in any field, lol.
LLMs have no special metacognitive capacity.
It is very clear-cut: LLMs do not have metacognitive capabilities. Metacognition is not part of their mathematical structure. Previously, on Lobsters, I have explained how LLMs are statistically and numerically aware of the distinction between generated and force-fed tokens; this awareness is not metacognition, since it can be anticipated by computing facts of the model state without performing inference.