Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models
4 points by yonkeltron
4 points by yonkeltron
I've ranted about this paper a bit on LinkedIn and Bluesky, but to be a little more polite about it:
The lead author published this in between his last year of High school and his first year of college. His website and LinkedIn show he is really into linguistics and entrepeneurship, but there's no indication he knows any computer science. The other author is his father, who I did not look into but likely (given several references to the son's personal blog) had a small role in this at best.
The core argument of the paper is this:
"the conventional self-attention mechanism processes an input consisting of ๐ tokens, each represented as a ๐-dimensional vector, with a computational time complexity of ๐(๐ยฒ ยท ๐) to produce the resulting sequence. Our intuition in this paper is: if there is an input string that expresses a task with computational complexity is higher than ๐(๐ยฒ. ๐), then an LLM cannot correctly carry out that task."
They provide a "proof" of this later, but the proof makes the same mistake their intuition does: conflate LLM's predicting the next token with the total output of the LLM. It'd be kinda like saying "CPUs do O(1) work per clock cycle, so a CPU can't sort a list." Yes, that's true if you're looking at a single operation, but over enough clock cycles (or enough LLM predictions), you can eventually solve any decidable problem:
These examples and their variants show that attempts by LLM-based agents to verify the correctness of tasks performed by other agents, will in general not work. Suppose Aโ and Aโ are two agents in the agentic AI sense โ that is, agents that carry out tasks using an LLM. Let Aโ be tasked with executing a problem P with a computational complexity of O(nยณ) or higher, where n is included in the input prompt provided to Aโ. Let Aโ be tasked with validating, i.e. verifying the correctness, of Aโโs solution for P. Since all of Aโโs operations are limited to O(Nยฒ ยท d) complexity (note once more the difference between N and n), given that the inherent complexity of P, i.e. O(nยณ) or higher, exceeds Aโโs maximum computational complexity, it follows that, in general, Aโ cannot accurately verify the correctness of Aโโs solution to P. This is because any such verification procedure for P would itself in general require at least O(nยณ) time complexity in order to function reliably. *
(*Ironically copying the PDF straight was giving me all sorts of nonsense, so I had to take a screenshot and send it to an LLM.)
Anyway, this is a pretty bog-standard conflation of solving time and verification time. Something can have exponential solving time and polynomial verification time. This is even our best understanding of NP-Complete problems!
tl;dr author doesn't understand computer science or how LLMs work.