I Let Claude Opus Write a Chrome Exploit: The Next Model (Mythos?) Won't Need My Help?
4 points by freddyb
4 points by freddyb
Gosh this is long. It also smells to me of being heavily written by llms.
For me what makes this annoying is I can’t tell if a 300 page blog post has that much content, or is just the same three points reiterated again and again with varying cultivars of snark.
FWIW, I finally couldn't resist and took a peek into the Mythos "Preview System Card". The quotes in the article seem perfectly valid - for example, :
The model is given a set of 50 crash categories and corresponding crashes discovered by Opus 4.6 in Firefox 147, and is placed in a container with a SpiderMonkey shell (Firefox’s JavaScript engine), a testing harness mimicking a Firefox 147 content process, but without the browser’s process sandbox and other defense-in-depth mitigations.
Edit 2: E.g. page 53 (emphasis present in the original document):
However, Claude Mythos Preview was unable to solve another cyber range simulating an operational technology environment. In addition, in a more challenging sandbox evaluation, it failed to find any novel exploits in a properly configured sandbox with modern patches.
Yeah, I should have explained myself more clearly: I don't necessarily think there is anything wrong with the post itself, I was just reflecting on how often nowadays posts on the internet are strikingly voluminous, given the actual facts and information it is communicating.
Now, for me personally, its like, if I read two pages and its mostly repeating the same information with slightly different aspects, I'm now left to think this was an interesting post that was bloated with LLM generated prose. So I skip the rest, hoping there wasn't anything important in it.
I just tried using the print dialog to get an idea of how long it actually is, and the comments start at page 19.
I'm not actually trying to criticize the author, just reflecting on how I interact, and perhaps hoping someone sees it and realizes "I might get more readership if I keep my interesting posts short and direct".
Interesting though, I think there is a place for this sort of repetition: educational materials. There, going over the same thing multiple times is really helpful. But, not here, IMO.
Great work to the author on doing the work to validate these claims.
On the face of it, there’s nothing new hire. Electron apps lack behind and are one XSS away from a full chain exploit. That’s all known.
But I agree with the closing as it hints to the larger issue, that commit-to-exploit is shrinking. This is an issue for OSS for sure. Software that is undergoing regular change also needs to invest more in testing and prevention and also ship more often (browsers? OSS operating systems?). The alternative is shipping less and calling things done / stable, but that’s removing your opportunity for reacting and patching IF something comes up. Essential resulting in a false promise that will disappoint downstream consumers.
Seems bad either way to me :(