claimcheck: Narrowing the Gap between Proof and Intent
4 points by heyyfernanda
4 points by heyyfernanda
Is this significantly better than giving a second LLM the proposition and the intent and asking "does this proposition correctly formalize the intent?"? I imagine this is something the authors tried, but couldn't see any mention of it.
That’s a good question! We had a single prompt variant but it was still giving two-pass instructions. I tried a “naive” prompt and it does better than the single prompt, but a bit worse than the separated two-pass. I updated the results here: https://github.com/metareflection/claimcheck/blob/main/reports/CLAIMCHECK_RESULTS.md In short, with Opus 4.6 the gap closes a bit, but the two-pass approach (with haiku and sonnet) is still ahead. There’s a cofounding factor that the two-pass approach works per file rather than per lemma.
Cool - thank you for that.
Also great work - a reliable natural-language-to-proposition is pretty crucial for the proof-based LLM future a lot of us envision.