A Horrible Conclusion
38 points by freddyb
38 points by freddyb
Should be tagged with vibecoding not ai.
The post is both about research ethics in the both the development and application of LLMs. I suppose it might make sense to mark it as vibecoding so that folks who have that tag muted can avoid the topic.
On Lobsters "vibecoding" covers "research ethics in the both the development and application of LLMs". I've argued fruitlessly against this in the past.
Let's suppose again that Anthropic has indeed headlessly gotten Claude to find hundreds of security-relevant bugs (which they don't explicitly claim, but heavily imply). First, why on earth would they think it appropriate to release this to the public without considerable lead time? There were only two months between the release of Opus 4.5 and 4.6, and their claim is that 4.6 was spontaneously capable of far more vulnerability discovery, meaning that they've spent likely less than three months experimenting with this. That's shorter than standard disclosure windows. Even supposing that the bugs they were finding were truly "high-severity" and "indicating an inflection point in cybersecurity", this is not enough time to do due diligence in applying these tools to open-source repositories, and their financial incentives overcame their duty to security.
What's the alternative? Should Anthropic conduct their own audits of every open source project in existence before releasing Opus 4.6? While other frontier labs are improving and publishing their own models which will soon have similar capabilities? As soon as any model capable of doing this is published, attackers will begin using it at a massive scale. If indeed the latest models are capable of cheaply finding large numbers of security-relevant vulnerabilities, that is a complete disaster for security everywhere and disclosure of this as soon as possible seems like the only thing that could possibly give people the chance to prepare.
Many people have, of course, been saying for years that soon LLMs would be able to do this kind of automated vulnerability research. So this should not be much of a surprise to anyone who was paying attention to the rapid improvement in LLMs.
Of course, ideally frontier labs would not be racing to publish improved models, given the immense disruption these models can cause in every area, not just cybersecurity. The only way to achieve that is through regulation, and Anthropic (alone among frontier AI labs) is heavily lobbying for regulation and slowing LLM development. And this public disclosure of the security applications of Opus 4.6 should be seen as part of this lobbying. So I really don't see how Anthropic has done anything wrong here at all.
Should Anthropic conduct their own audits of every open source project in existence before releasing Opus 4.6?
No, but they should certainly (1) assess which projects would be most targeted, (2) give them enough time to handle reported issues, with work on Anthropic's part to make sure that they aren't shovelling slop into already overworked maintainers' plates, and (3) dedicate significant effort and resources to resolving these issues before releasing such tools.
While other frontier labs are improving and publishing their own models which will soon have similar capabilities?
Quick, everyone else is making money by ignoring ethical obligations, we gotta beat them to it!
If indeed the latest models are capable of cheaply finding large numbers of security-relevant vulnerabilities, that is a complete disaster for security everywhere and disclosure of this as soon as possible seems like the only thing that could possibly give people the chance to prepare.
(1) They aren't, and (2) fully disagree; there is an opportunity to mitigate harms before throwing it into the hands of the public. Immediate public disclosure is rarely the first, best choice, and we know this from other vulnerability research applications.
this public disclosure of the security applications of Opus 4.6 should be seen as part of this lobbying.
Once upon a time, OpenAI was a non-profit company working towards the development of AI for the benefit of humanity. Then there was sufficient money involved, and now they are for-profit. Anthropic is a for-profit company which positions itself as pro-humanity. Forgive my cynicism, but this company is, and will be, for the maximisation of capital. Anything suggesting otherwise is primarily marketing; just look at their investors for the last fundraising round! I doubt they will choose to act ethically when it comes to vulnerability research because I doubt they will choose to act ethically in any capacity, except when it serves to increase capital.
Furthermore, look at the bugs they listed: bugs not recognized as vulnerabilities or in small open-source projects. We're not shattering the foundation of the earth here. They frame it as "meaningful 0-day vulnerabilities in well-tested codebases" and capabilities that require "new cyber-specific probes to better track and understand the potential misuse". But these bugs don't tell me that, and they have some mysterious 500 other zero-days tucked away? This is marketing material to frame them as exclusive salespeople to a new age of vulnerability research, not lobbying for regulation. So, yes, what they're doing is wrong, because they are doing what we have been chastising security vendors for for years: scare tactics wrapped up in technical jargon and misrepresentation of capabilities.