LLM Neuroanatomy: How I Topped the AI Leaderboard Without Changing a Single Weight

54 points by knl


thesnarky1

This is a fascinating read about how the models are structured. Even if you are tired of all the vibecoding articles lately, this one is correctly tagged as ai, because it gets much more into how the things work and what structural changes to a model ended up doing to it.

kornel

It bothers me that Transformer architecture spends equal amount of compute on a yes/no answer to an arbitrarily complex riddle as it does on "The " in any other message.

It's fascinating that layers can be looped. Perhaps the next step would be to have a model dynamically select number of loops or choose to skip groups of layers MoE style?