A thought on JavaScript "proof of work" anti-scraper systems

20 points by runxiyu


icefox

Repeat after me everyone: the problem with these scrapers is not that they scrape for LLM’s, it’s that they are ill-mannered to the point of being abusive. LLM’s have nothing to do with it.

The purpose of PoW anti-scraper systems is not to get rid of the LLM’s, it is to slow the scrapers down to a rate that is less abusive. I have no idea why someone would write an abusive scraper system in the first place, but I expect that it comes down to economics. Most search engine spiders try pretty hard to be well-mannered, because the website is their product; they need the website owners to want their site to be spidered. Criminal scrapers try to be difficult to notice so they don’t get blocked as much, and/or have fewer resources available at one time to cast over a wider area. Meanwhile, I expect somewhere in most LLM companies is a sub-department whose productivity is measured by “new training data acquired in TB” and nobody making decisions for that department cares about long-term sustainability as much as they care about a quick buck.