Agent Inference - A user-agent / browser quiz
5 points by algernon
5 points by algernon
It’s so crazy that we have one field that we cram so much machine-parsable stuff in to, with so little consistency.
How does this happen? Why don’t bots just copy user-agents of existing browsers? Do they try to randomize some of its parts per host and get it wrong, or what?
Why don’t bots just copy user-agents of existing browsers?
No clue! But I like it this way, makes it easier to detect them.
Do they try to randomize some of its parts per host and get it wrong, or what?
Not even per host. I’ve seen the same IP request multiple resources from the same host, within a single second, using 4 different user agents.
How does this happen?
I suspect this is what happens when you ask an LLM to vibe code a user agent string (or an entire crawler) for you.
It predates vibecoding. In 2022, a Web app that I host came to a crawl due to a distributed attack, but there was a handful of distinct UA strings. Rather than being particularly impossible, they were all very old for 2022. 🤷♂️
I can see at least 2 reasons.