How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation

26 points by ohrv


taras

One of the few healthy perspectives on AI, wish there was more ai tooling for careful experiments like this. Here he sets up a controlled benchmark, comes up with an algorithm to feed context into an llm and then manually verifies the result. Would be nice to have a cli or UI tool to quickly setup such pipelines and to efficiently review the results.

Nice fresh breath of air over all the “I let agents loose on my code”. I think it’s a good bet that AGI isn’t gonna happen on LLM tech and that significant value will be carefully extracted via thoughtful human-in-the-loop pipelines like this.

grayhatter

o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives.

8% success rate, where the failures are split 1/3 false positive and 2/3 false negatives.

I see we’re using the word “find” very loosely.

[when providing larger input file] o3 finds the kerberos authentication vulnerability in 1 out of 100 runs with this larger number of input tokens, so a clear drop in performance, but it does still find it.

again, “find” is doing a lot of lifting here.

I don’t know what I was expecting, reading this… but it wasn’t clickbait as I’ve now categorized this post/author.

river

I notice many people discussing this post without knowledge of the fact that he had the LLM discover a vulnerability that he personally discovered without an LLM previously.

This is similar to how you can get an LLM to give you the right answer to a difficult question that you know the answer to by tweaking the prompt a few times til it says what you want