When AI Writes the World's Software, Who Verifies It?

20 points by regulator


hwayne

This feels AI generated, in part because it makes a mistake I'd expect from an AI but not from someone as accomplished in FM as Leonardo do Moura:

Once verified components are cheap, you compose them with confidence. Today, integration is where most bugs live: the boundaries between components, the assumptions that do not quite match. Integration testing exists precisely because component tests do not compose: passing unit tests on two components tells you nothing about whether they work together. Verified interfaces are fundamentally different. When each component carries a proof against a shared specification, the composition is guaranteed correct by construction. You can verify each piece independently and know the whole is sound.

This is actually one of the most common ways formally verified code fails in production! You can have two proven-correct bits of code, that each totally satisfy their own local properties, but interact in a way to violate a global property. Pamela Zave has a bunch of examples here.

lorddimwit

“See, that’s the clever part. Come winter, the gorillas freeze to death!”

regulator

I've regularly seen people talk about formally verifying the output of LLMs, and I've seen a few people in my circles start learning Lean for this purpose, but I don't have a good perspective on this. Do any of you Lobsters have insight or experience with this?

nextos

I love Lean, and maybe it's just me, but I find refinement types (like Dafny) or some hybrid (like F*) better suited for large-scale formal development. I like the split between code and contracts.

informal

I know some people don’t like AI.

But code written by human contains bugs as well, there are probably less bugs if we use AI.

I think the solution for AI-generated code is stronger test process, it cannot solve all problems, but illuminates bugs a lot.