Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models

4 points by yonkeltron

hwayne

I've ranted about this paper a bit on LinkedIn and Bluesky, but to be a little more polite about it:

The lead author published this in between his last year of High school and his first year of college. His website and LinkedIn show he is really into linguistics and entrepeneurship, but there's no indication he knows any computer science. The other author is his father, who I did not look into but likely (given several references to the son's personal blog) had a small role in this at best.
The core argument of the paper is this:

"the conventional self-attention mechanism processes an input consisting of 𝑁 tokens, each represented as a 𝑑-dimensional vector, with a computational time complexity of 𝑂(𝑁² · 𝑑) to produce the resulting sequence. Our intuition in this paper is: if there is an input string that expresses a task with computational complexity is higher than 𝑂(𝑁². 𝑑), then an LLM cannot correctly carry out that task."

They provide a "proof" of this later, but the proof makes the same mistake their intuition does: conflate LLM's predicting the next token with the total output of the LLM. It'd be kinda like saying "CPUs do O(1) work per clock cycle, so a CPU can't sort a list." Yes, that's true if you're looking at a single operation, but over enough clock cycles (or enough LLM predictions), you can eventually solve any decidable problem:

They make a pretty fundamental CS error with this:

These examples and their variants show that attempts by LLM-based agents to verify the correctness of tasks performed by other agents, will in general not work. Suppose A₁ and A₂ are two agents in the agentic AI sense — that is, agents that carry out tasks using an LLM. Let A₁ be tasked with executing a problem P with a computational complexity of O(n³) or higher, where n is included in the input prompt provided to A₁. Let A₂ be tasked with validating, i.e. verifying the correctness, of A₁’s solution for P. Since all of A₂’s operations are limited to O(N² · d) complexity (note once more the difference between N and n), given that the inherent complexity of P, i.e. O(n³) or higher, exceeds A₂’s maximum computational complexity, it follows that, in general, A₂ cannot accurately verify the correctness of A₁’s solution to P. This is because any such verification procedure for P would itself in general require at least O(n³) time complexity in order to function reliably. *

(*Ironically copying the PDF straight was giving me all sorts of nonsense, so I had to take a screenshot and send it to an LLM.)

Anyway, this is a pretty bog-standard conflation of solving time and verification time. Something can have exponential solving time and polynomial verification time. This is even our best understanding of NP-Complete problems!

tl;dr author doesn't understand computer science or how LLMs work.

yonkeltron

This is a great analysis and takedown. Thanks for it!