Agent Inference - A user-agent / browser quiz

5 points by algernon

eising

It’s so crazy that we have one field that we cram so much machine-parsable stuff in to, with so little consistency.

dzwdz

How does this happen? Why don’t bots just copy user-agents of existing browsers? Do they try to randomize some of its parts per host and get it wrong, or what?

algernon

Why don’t bots just copy user-agents of existing browsers?

No clue! But I like it this way, makes it easier to detect them.

Do they try to randomize some of its parts per host and get it wrong, or what?

Not even per host. I’ve seen the same IP request multiple resources from the same host, within a single second, using 4 different user agents.

How does this happen?

I suspect this is what happens when you ask an LLM to vibe code a user agent string (or an entire crawler) for you.
- hsivonen
  
  It predates vibecoding. In 2022, a Web app that I host came to a crawl due to a distributed attack, but there was a handful of distinct UA strings. Rather than being particularly impossible, they were all very old for 2022. 🤷‍♂️
pointlessone
I can see at least 2 reasons.
1. Try to break things. There’s no unversal approach to parsing UA. If it’s parsed it’s in an ad-hoc manner and it’s probably complex so potentially contains bugs.
2. Try to get past arbitrary UA filters. Bots compose a UA that is more likely to match at leas one regex. That’s how you end up with UA strings that list all browsers and all major OSes and architectures.