LLMs Can Get Brain Rot (after consuming too much social media content)
15 points by 3bodyproblem
15 points by 3bodyproblem
So they trained two LLMs, one with a regular dataset and a second one with garbage added in. After instruction tuning, the one with a lower quality pretraining dataset produced lower benchmark scores. I find the result hardly worth the sensational wording ("cognitive decline") they use.
Well... not two.
They trained 4 LLMs on 10 datasets.
I won't say much more to avoid further straining your reading skills; sorry. At least, scroll to the results table?..
Thanks for the correction, I confused the "M1" and "M2" metrics with models. The results table seems to show that higher a "Junk Ratio" correlates with worse benchmark scores. Perhaps I still don't understand something but that hardly is a surprise, is it?
Look, if we take any statement in general, there's a big difference between:
And these don't necessarily always align. That's why we have science.
I'm aware of the difference between a plausible hypothesis and its supporting evidence.
But here the claim, literally the title of the paper, is LLMs Can Get "Brain Rot"!, including the exclamation mark. This claim is already twice removed via metaphor from the measurements (benchmark scores) they are able to make, but I suppose it's sensational on purpose. Combined with their hypothesis that indirectly confirms a common belief ("social media is bad for the brain"), well, I just have to assume the findings themselves are quite modest because they have to sell them so hard.
Their first "key insight" bullet point states that
[--] models increasingly truncate or skip reasoning chains, explaining most of the error growth.
So the model produces shorter outputs. They describe the two "M1" and "M2" interventions' data collection like this:
For M1, we choose samples with a length < 30 and a popularity of > 500 as junk data, and samples with a length > 100 and a popularity of ≤ 500 as control data.
The "junk" data has 1/3 of the length of the control group. Could the effect they observe be caused simply by the fact that they trained on short social media posts?
I did the unthinkable and skimmed the paper myself, and indeed, the length of the individual messages in the training data already explains most of the effect. In an ablation experiment, they split the dataset in a different way, by "length" and "popularity":
For the length-only metric, we let samples with length > 100 be the control data and < 30 be the junk data. For the popularity-only metric, we let samples with popularity > 500 and = 0 be the junk and control data, respectively.
Here's a reconstruction of the relevant table in the paper. See how the effect of "Length" dominates in 2/3 benchmarks, and is still 10/17.6 percentage points in the "ARC Challenge" where it doesn't.
Table 3: "Ablation of the junk metrics in M1. ∆ represents the difference between Junk and Control."
Model ARC Challenge (COT) RULER AdvBench Risk ↓ Length Popularity M1 Length Popularity M1 Length Popularity M1 Control 75.2 70.7 74.9 90.1 83.9 90.5 61.2 64.8 77.6 Junk 65.2 54.1 57.3 73.2 70.2 71.0 89.8 71.2 88.8 Δ -10 -16.6 -17.6 -16.9 -13.7 -19.5 -28.6 -6.4 -11.2
Based on this data it seems like length of the posts is the primary factor driving the drop in benchmark quality. This hints at the possibility that you could pretrain the model on any short texts, say Zen kōans, observe it produces shorter outputs, and declare it's got "Brain Rot" now.
It's mildly interesting that their "popularity" criterion also has an effect, but I still think they overstate their findings.
I agree. Can't deny that their findings are overstated.
I also think there's fundamental tension between succinctness & accuracy of summaries. And that's especially tough for headlines... they nearly always stretch the exact meaning. Sometimes intentionally ("clickbait"), sometimes out of necessity.