claimcheck: Narrowing the Gap between Proof and Intent

4 points by heyyfernanda

jkarni

Is this significantly better than giving a second LLM the proposition and the intent and asking "does this proposition correctly formalize the intent?"? I imagine this is something the authors tried, but couldn't see any mention of it.

namin

That’s a good question! We had a single prompt variant but it was still giving two-pass instructions. I tried a “naive” prompt and it does better than the single prompt, but a bit worse than the separated two-pass. I updated the results here: https://github.com/metareflection/claimcheck/blob/main/reports/CLAIMCHECK_RESULTS.md In short, with Opus 4.6 the gap closes a bit, but the two-pass approach (with haiku and sonnet) is still ahead. There’s a cofounding factor that the two-pass approach works per file rather than per lemma.
- jkarni
  
  Cool - thank you for that.
  
  Also great work - a reliable natural-language-to-proposition is pretty crucial for the proof-based LLM future a lot of us envision.