The Coming AI Revolution in Distributed Systems
13 points by hwayne
13 points by hwayne
Interesting because the author was able to give o3 a codebase and it derived an accurate TLA+ specification, which found a design bug.
You just need a TLA expert to oversee everything, and then an endless system of checkers of the previous checker, and add in some other expert systems or formal systems stuff, and then you’ll have a system that you can’t be sure if it is formally encoding missed requirements.
You can never be sure that you’ve encoded a spec that misses requirements. That’s why for me there’s no escaping the human in the loop for spec creation, even if the spec is partially AI-generated.
That doesn’t lessen the value of specs for me by the way. It’s just a fact of life that at some point you need a link in the chain that is trusted, and that link is the source of truth. With all of its tradeoffs.
Wow! This is impressive. I want to know how he actually made the AI traverse the code. Did it someHow run a python program that would then in turn inspect the code? They never really described that. In fact, most of these things never really describe how they actually use the AI. They just say oh, it did really well. I wish someone would actually explain what they did to make the code get analyzed.
AI can now autonomously generate accurate formal specifications directly from very large production codebases
I’m very open to this, but I want to see it done for an open source codebase! I would like to learn about it, and I even have a modestly sized codebase at under 60K lines
Similar to - https://lobste.rs/s/gkpmli/if_ai_is_so_good_at_coding_where_are_open
The AI autonomously produced precise TLA+ specifications from Azure Storage’s production source code, uncovering a subtle race condition that had evaded traditional code reviews and extensive testing.
Yeah, this is really cool. The specification-implementation gap is a real challenge for design-time verification of distributed systems (“does the implementation work as specified”). Runtime monitoring, safeguarding, and other techniques exist to close this gap, but take significant extra effort. This approach flips the problem over: by making the spec from the implementation, we can know (modulo AI reliability) it matches the implementation!
It also makes me more optimistic about using AI to do trace-checking and test case generation. Both of those are pretty effective for connecting specs to code but need a lot of custom translation code written for the project, which seems like the kind of thing an LLM could handle.
I think AI and formal methods are going to become heavily intertwined in the coming years. But deriving the spec from an implementation seems like the backwards approach.
I could be wrong of course, and it may be way more practical to extract specs rather than write them ahead of time given the fact that everyone already has an implementation. But since it’s the source of truth, I really want the spec to be dependable and hand-inspected.
We should definitely keep experimenting like this though. Maybe some use cases like this end up having counterintuitive value.
But deriving the spec from an implementation seems like the backwards approach.
I think you’re right, but on the other hand, having a codebase that’s usually working but nobody really knows how exactly is a more likely scenario than starting with a rigorous spec, unfortunately.
One of the hard problems in review, is whether I am reading the literal meaning or the intended meaning. Being bad at reading the intended meaning when it contradicts the literal one sometimes helps (while it impedes navigating broken enough social situations).
I guess spec extraction might benefit from this issue, in a sense? Language AI trying to summarise what is the intended spec of the code, and missing the subtle issues, which can then be found by a symbolic what-we-called-AI-last-time.
funny that Azure uses Claude to do the actual work ^^