Using ‘Slop Forensics’ to Determine Model Ancestry

17 points by emschwartz


n1000

If you dismiss this because you’re not interested in LLMs you might miss a pretty cool statistical idea!

Combining stylometry with phylogeny to draw trees is clever! Using it to reveal the phylogeny of synthetic training data is pretty neat, but I want to see the same technique used on real literature. Can we draw a phylogeny (maybe a more complex graph) of influential authors? Genres?

I’d like to see, or maybe I’ll do, a sort of network analysis of literature. Maybe some statistical methods can quantify the impact of Poe and Arthur Conan Doyle on mystery fiction. I don’t know, it looks like there’s a lot of potential for some novel digital humanities work on quantitative analysis of literary genres & influences.

Slop forensics repo