Assessing Claude Mythos Preview’s cybersecurity capabilities
72 points by equeue
72 points by equeue
[Company eyeing IPO says they have secret world-changing mega-powerful technology, but nobody is allowed to see it to verify for themselves for security reasons.]
#1 safety suggestion: Use our software.
Wow a remote crash bug in OpenBSD.
I wish they would able to keep the "Only two remote holes in the default install, in a heck of a long time!" in their homepage, but after the advancements of the past few weeks I'm not so sure anymore.
A crash is probably the best way it could have ended for OpenBSD, they very much have a culture of "if anything goes wrong, halt and catch fire instead of continuing in a potentially compromised state"
At least it's "only" a remote crash/DoS for OpenBSD, as opposed to the RCE they found for FreeBSD?
I'm surprised they couldn't get any working remote exploits for Linux. They attribute that to "the Linux kernel's defense in depth measures," but if you had asked me beforehand which OS had the most defense in depth, out of those three I would have guessed OpenBSD.
lot more eyeballs on linux, if nothing else
This is a presentation by one of the Anthropic researchers at Blackhat. He shows how they found a remotely exploitable buffer overflow in the Linux kernel about 9 minutes in. The whole thing is worth watching.
Note that this is research from February and they used Claude Code and Opus-4.6. Same model we can all use. The research where they found 100+ crashers in Firefox that the Firefox team decided to fix. The research that caused a macOS update with a dozen security bugs attributed to Claude.
No special Mythos model required back then.
The OpenBSD code was 27 years old!
I'm pretty surprised there's any code in a project of that nature that hasn't been touched in that long.
I will say several things. First, extreme skepticism is unwarranted. These models have gotten pretty darn good at finding bugs. Not all bugs, but more than enough for it to really matter. And FWIW, over the past couple of years, I repeatedly made the point that most problems in infosec boil down to text comprehension at scale. I'm not surprised that LLMs can automate many aspects of it, wholly independent of reaching "AGI".
I'm less convinced that this is a fundamental inflection point for software security. The bulk of vulnerability research - I'd not hesitate to say 80%+ - was already automated with fuzzers and related tools. Fuzzers discovered tens of thousands of bugs in critical software and put some visible strain on the OSS ecosystem. We now have a new tool that will uncover even more bugs, but I don't think this is how the world ends. When I came up with afl-fuzz, I could've put out a press release saying that it's too dangerous to share. We survived. So, there's a combination of impressive results and marketing at play.
As a tangent, I worry a lot more about enterprise security. We now have a tool that can be deployed on the cheap to pull on every door handle at any large organization; solving this attack surface is less tractable than finding software bugs, in part because, to stick to my tortured metaphor, the "handles" the bots can pull on include every human working at the company. Worse, with vuln discovery, there's this inherent symmetry: you can use the same tools to beat the bad guys to the punch. Using LLMs to secure an enterprise is considerably harder, in part because you need much higher decision fidelity to automatically stop bad things without getting in the way of normal work.
this Anthropic thing looks to me very like an expensive fuzzer. (i mean, fine. really static code checking, but the results feel fuzzer-ish to me. YMMV.) Is that how it looks to you?
currently looking for numbers on the cost of running this thing
edit: "under" $20k per bug run apparently.
It can definitely spot some bugs that fuzzers can't because... well, language comprehension! It can also write reports, propose patches, and sometimes write exploits. So I wouldn't describe it just as an expensive fuzzer. It takes automation further than we could've taken it before.
Cost-wise... hard to say. The amount of compute spent on training LLMs is orders of magnitude higher than what we've ever spent not just on fuzzing, but on the totality of infosec automation, ever. But once trained, LLMs also have other uses.
As for inference costs alone... yeah, if you have thousands of bucks to throw at fuzzing, you can probably uncover something interesting in pretty much any program of your choice. I'd guess that only a handful of software packages have ever gotten more than $10 worth of fuzzing (i.e., maybe a month of CPU time). But for an exploit, you'd need to pay a human, and at today's rates, it might be more than $20k.
I've been thinking today about what personal security measures I should start taking to prepare for a world where anyone has this kind of capability on tap. (Even if Anthropic for example creates effective safeguards on its public models, presumably less scrupulous providers will catch up in capabilities.)
Some starting ideas that seem right to me:
I should consider switching to use a hardware security key to replace important passwords and 2FA when possible, since it should be more resilient than my phone or laptop password manager and TOTP software (and the environment that it's running in, which could get keylogged, etc.)
I should try to run third-party software on my laptop under a user account that has less privileges. Right now I actually have passwordless sudo enabled, which isn't going to cut it, but also my user account that I use to run most things just has access to a ton of my stuff. That's not good if one of those things is compromised.
I should try to run a stricter firewall. A superhuman exploit finder who doesn't already have code on my box, can't hurt me if I don't read their packets. I don't know how to balance this with convenience, but at least I should deny by default software that doesn't seem like it should need Internet access, or only needs it for trivial-to-me reasons like ads or autoupdaters or something, and see if I can make it work without that.
I should probably start looking into how to more effectively sandbox third-party software that does need Internet access.
One thing I don't know how to deal with yet is online banking. I don't understand the banking security model well enough to understand whether it's important for me to e.g. try to only do online banking from a more locked down device. If someone gains full control over my laptop, will they be able to steal my money, or will they just be able to initiate some kind of transfer that I can easily reverse? Unfortunately I really have no clue.
If someone gains full control over my laptop, will they be able to steal my money, or will they just be able to initiate some kind of transfer that I can easily reverse? Unfortunately I really have no clue.
I don't know how this works in the rest of the world, but in Poland, when you lose access to your bank account, the attacker can use Blik to withdraw money from an ATM, and at that point the money is irrecoverable. My aging mother was on a receiving end of a phishing attack recently and it went this way. If your bank lets you withdraw money from an ATM without the use of a physical card in any way, you might have a similar problem in case of an attack.
The easiest way to guard against that sort of thing in the US, other than only using cash, is to use a credit card, not your bank's debit card, as a firewall to your bank account as much as possible. Sometimes you can't, but a lot of the time you are able to pay with stuff using a card. Works really well because most credit cards have liability agreement that is in your favor and credit card companies are diligent about getting their money back whereas banks are like ¯\_(ツ)_/¯.
Very true, my debit card was skimmed years ago and it took me ~3 months to get the bank to refund the money stolen from my account.
The ridiculous under-investment in OS and language-based security is going to lead to some nasty exploits, followed by a rush to fix things. Even simple supply-chain attacks seem to be happening on a regular basis.
Personally, I'm convinced that 3-letter agencies have had sophisticated capabilities to find and exploit vulnerabilities for a number of years, not necessarily based on LLMs. In fact, program analysis is probably better at finding many classes of bugs and can also guarantee their absence.
It's not acceptable we are still running things by default without some kind of sandboxing or role-based access control. Nor that memory unsafety is common and sophisticated program analyses rare. We are at least two decades late.
I'm angry.
How much money has Anthropic and others used so far? Had a mere 1% of that been spent on auditing and non-AI tooling, it would probably have yielded results at least as good years ago already.
Instead we get a new kind of pollution: issues but no solution unless we count "use [our] AI (and pay)" as a solution. The good solution would be wages and long-term investments but that does the opposite of concentrating richness.
I'm angry and I don't think that will improve soon.
Anthropic recently donated $1.5m to the Python Software Foundation to spend on security without any requirement that it be spent on AI (disclaimer: I'm a PSF board member but had nothing to do with that initiative.)
$1.5m is reputation laundry for companies that are burning hundreds of billions of dollars to generate tokens for funny meme pics. That and their models would not really function without these languages in the first place.
Besides what's said in the comment next to mine, I should clarify that I didn't mean that AI companies specifically should have spent 1% of their money on such topics. Instead, I meant that as a whole, no matter where the money comes from.
AI tools are touted as solutions to security problems. I'm arguing that I think they are still not cost-effective and come with downsides (especially that they're creating batches of reports that are larger than what can be dealt with by too few maintainers).
edit: And the amount reminds me of what I saw a few years ago working at business airport in France: somebody tipped that luggage person something like 150 EUR for a 30 meters push on a luggage trolley. Adjusted for inflation and exchange rate, that's already 0.015% of these 1.5M USD. Of course, that's far less than the matching plane taxi (a few thousands euros) but I think it shows how little 1.5M USD can actually be.
I agree. All these language-based security niches, including program analysis and abstract interpretation, were grossly under-invested despite their ability to turn programming into actual engineering. Quite literally, as they can provide mathematical guarantees of correctness and they scale really well.
A good proprietary abstract interpreter running on a large workstation can evaluate a real-world codebase (200-500 kloc) in a couple of hours, guaranteeing a large class of errors are absent, at the expense of a few false positives. Crucially, analyses can be designed to be sound, with zero false negatives.
As someone who worked on this both in academia and industry during the late 2000s, it was really sad and frustrating to see how the whole field struggled to take off because nobody cared about quality, except of course certain companies building real-time critical systems.
Now, we are about to experience a dramatic shift, and lots of major public funders are rushing to start entire programs on formal methods. Models, like those from Anthropic, have turned the under-investment into some dangerous asymmetry. Anyone equipped with a LLM can easily find unsafe code idioms and exploit them.
I think this is the openbsd bug the model found: https://ftp.openbsd.org/pub/OpenBSD/patches/7.8/common/025_sack.patch.sig
This is a missing bounds verification, and a missing null pointer check - two things that are detectable via local pattern matching, and don't actually require deep reasoning. (I have to grant it to the model that it would need to traverse the way in which this code is exercised and confirm that it is actually exploitable.)
My read on why the bug wasn't caught previously is that developers have a large surface area to cover and can't bring the same level of sustained attention to it. For this sort of thing, the model really has an advantage. But it doesn't strike me as an illustration of superior reasoning skills.
This is a missing bounds verification, and a missing null pointer check - two things that are detectable via local pattern matching, and don't actually require deep reasoning.
Is this true? In a codebase specifically designed for it, maybe, but production C codebases might have arbitrarily complex arguments for why something is never null (or out of bounds) in practice. So any such analysis would, at best, produce loads of false positives.
My read on why the bug wasn't caught previously is that developers have a large surface area to cover and can't bring the same level of sustained attention to it.
OpenBSD, according to them, has been conducting file-by-file audits since summer 1996.
Presumably if there was some way to automate in the way that you're talking about automating, they would have done it.
I think the likelihood that this is unsubstantiated marketing hype rounds to zero. It would be a very painful lie to have exposed (both bad press and felony securities fraud), it will be fully evaluated over the next few months, and it names a number of world-class security operations as counterparts, any of whom could have tweeted "uh no we have not actually heard anything from Anthropic" in the last two days. Also, experts are acting like it's real.
It's straightforward to automate a coding agent to read the commits (or diff the binary) of all new releases, check each change for an implied security vulnerability, derive an exploit for any potential vulnerability, and then start using that exploit to pop production systems. Even if a low opinion of LLM coding capability were correct and it will notice a vulnerability 1% of the time, an attacker will have a positive ROI running their agent in parallel a thousand times against each change.
We have to prepare for a world where we have 2-5 minutes from the release of patched software to active public exploitation.
Did you go through Claude's C Compiler : https://github.com/anthropics/claudes-c-compiler.
I am a little bit skeptical on what they say after this.
I'm not mad that Anthropic did the C compiler thing as a marketing demo, and hell, I think it's a great marketing stunt! I like it as such, and thought it was plenty impressive as that. Sure, in practice its a pretty terrible compiler, cheating every which way it can to make the demo work, but I dunno, that's somewhat what I expect for a marketing stunt. =]
Now, there is definitely some marketing in the OP, but I think even accounting for that, the extent and severity of these issues is staggering... It reads as a completely different level of shocking results than the compiler.
And I think the fact that most of Anthropic's competitors are joining in their response effort, rather than any of them arguing back is, at least to me, some further indicators that even the non-marketing version of this is sufficiently bad to be scary.
Fundamentally, for most software like a C compiler, there are so many challenges to over come beyond turning a super tricky source base into executable code that "works" for some minimal definition. It doesn't just need to technically check the box of being a C compiler.
But building exploits is a categorically easier challenge. Zero maintenance concerns, readability concerns, code quality concerns, etc. Totally irrelevant anything but "does it do the thing". This is a perfect category of code for these tools to build, and so I find the difference between the compiler and these exploits to track pretty well.
I'm encouraged that they included the SHA-3 hashes of their individual vulnerability reports; it telegraphs a "Please hold us accountable" attitude, at least in this case. Time will tell if they actually publish, of course!
"AI built a C compiler," "AI built a browser," etc… those all seem more like publicity stunts: stuff that's been built before (and may have been in the training data), not stuff you'd actually want to use. Whereas in this case, it sounds like they actually found real bugs and submitted reports upstream. I hope we get confirmation (or denials) from the affected projects soon.
What exactly are you saying? Because unless the claimed vulnerabilities simply aren't real, this is an important effort.
I think the article on the Claude C Compiler was inadequately clear about the compiler's shortcomings. However, the author made a very real effort to qualify the results. It was not a pure marketing effort, or breathless hype. Perhaps the biggest criticism is that the author didn't make it clear how much the effort had overfitted itself to the existing GCC harness (https://john.regehr.org/writing/claude_c_compiler.html).
For example, near the end of the project, Claude started to frequently break existing functionality each time it implemented a new feature.
The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.
The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.
Once again proposing technical solutions to human social problems. Will we ever learn?
That's a very generic criticism; care to elaborate? Promoting smaller, more localised islands of protection is a social solution, since it means that communities can self-regulate rather than depend on centralised services. As my post notes, we've had the technical solutions in place since over 15 years ago...
It's obvious this technology has the potential to be dangerous, the same way that a chainsaw, car or airplane can be dangerous. Chainsaws are generally only dangerous to the person holding them, horror movies aside, so keep them away from unsupervised children and you're probably fine. But cars or airplanes can be dangerous to lots of people besides the operator. So how do we deal with that? We regulate them. We require licenses to operate them, so that people can go outside usually without getting run over. We require that they be well maintained, so people can go outside usually without having loose bolts from a passing plane falling on their head.
"But if we do that then criminals will have access to this technology anyway!" Yes, criminals running billion-dollar R&D efforts in billion-dollar data centers. I'd hope law enforcement can figure out how to deal with that. Optimistic, I know.
Not blaming the author; they're a researcher, it's their job to think of New Things. But "this technology is dangerous" is not exactly a new thing, and we already have mechanisms to deal with it. I rather doubt "Anthropic's tool is a massive force multiplier for bad actors" is going to be most effectively countered by "use Anthropic's tool to deliberately obfuscate the world for good actors". Just look at all the auto-immune disorders in the world, for one. Maybe "hold Anthropic responsible for how their tool is used" is the better option.
It certainly won't need an entire new technology for computer programming with self-modifying code making it impossible to tell what the hell program anyone is actually running at any given time. Our immune system is a compelling analogy and an incredibly powerful tool but it is also one designed to counter constant, low-level threats that change incrementally without any planning or intelligence. I like a lot of the ideas discussed. But if the concern is human bad actors, your immune system won't do jack shit against an intelligent person who says "let's just fill their tea cup with anthrax".
People keep fucking saying "AI is inevitable". Is leaded gasoline inevitable? Is asbestos inevitable? Or have we just spent our lives being trained to let ourselves be pushed around? Maybe the best way to survive smallpox is to get rid of it.
Last August I started seeing the claim that open models are 9-12 months behind frontier. I don't have a more recent link handy, but I've seen general sentiment that it holds true. This may be an issue we can handle as a handful of well-known companies now, but it won't be in a year and it really won't be in two.
The difference between AI and leaded gasoline or asbestos is that the latter require specialized physical production plants. Turing completeness means AI requires general purpose computers and the cost of compute continues to fall.
I mean, my experience with AI tools thus far is that most systems can't do the things that Anthropic is doing, but I don't know whether that's just because "everyone else is a year behind" or because they put a lot of non-transferrable work into it that is difficult to reproduce.
"The cost of compute continues to fall" has certainly been a long-term trend but certainly not a short-term one, and these models seem to personify the accompanying trend of being able to eat up as much compute as you can throw at them. I don't know the industry well, but my attempts to price out a system for running open models for $WORK says that the hardware tends to start at 100k USD, which is more than you'd pay for an old light aircraft and lessons to fly it.
Maybe I'm just being optimistic about the genie being able to be put back into the bottle. I guess we'll find out the hard way, in real time.
I deliberately said the cost of compute rather than the price of compute. Overwhelming demand driving up prices makes this more concerning to me, not less.
And it's important not to look at costs from the perspective of an individual purchase, but as a business investment. If it costs $100k to set up a private rig, can you use it to compromise a system to extract more than $100k via ransomware, credential theft, phishing, trade secrets, market manipulation, etc? For useful context, there is a resilient international black market that extracts value from credit cards $20-$200 at a time.
This is the first frontier model release that has made me feel a little anxious about the future of SWE in general