Big LLMs weights are a piece of history

2 points by technetium

tarakiyee

we are living in the stupidest timeline.

altano

“I believe that the obvious approach of trying to preserve everything is going to fail, for practical reasons: a lot of efforts for zero economic gains: the current version of the world is not exactly the best place to make efforts that cost a lot of money and don’t pay money“

Imagine suggesting we need to replace something that’s worked for decades with AI and having the above argument be your argument.

Effectively arguing that the only way to make internet archiving (something that’s been viable for decades) economically viable is to scrape all the data the same way but then ALSO run it through model training for hundreds of millions of dollars (something that still isn’t making money and barely preserves the source material). What on earth am I missing?????

insanitybit

It certainly seems like we need multiple archiving projects. I’m not really sure that LLMs are suited to the task, they feel too lossy and I’m not sure why they would be better than, say, much more straightforward compression. I suppose the answer is that LLM compression is much more effective in terms of size reduction but idk. I’m not really sold on this yet. I’m really hoping that we can just create multiple independent internet archives instead, the value is so astronomical and it would be the first time in human history that we can actually preserve information to such a degree.

That said, it does seem reasonable to archive models as they do potentially capture interesting properties of their training data, like tone.