Giving LLMs a Formal Reasoning Engine for Code Analysis

16 points by Yogthos


typesanitizer

Have you actually run evals on the efficiency claim? Fwiw, the kind of program you've written matches exactly what I've envisioned before, but I've mellowed down on these approaches because:

  1. Experience tells me that LLMs struggle much more with using MCPs and skills correctly than with writing complex grep commands and writing one-off Python scripts (seemingly what they've been RL-ed to do).
  2. The point about efficiency was made previously with Aider natively supporting repository maps, but if it was an obvious win, given the copious amounts of feature copying that goes around, it's plausible that this feature would've been widely adopted across agents (it hasn't been AFAICT).
  3. Re: logic-based analysis, I think one case where that might perform worse than an LLM today is with dynamic dispatch, where you have test code and non-test code and the presence of the mocks means the reachability characteristics of code are different, and basically there's no reliable way of discovering test roots across languages using tree-sitter (you could probably hack something together that works well for common cases in newer languages).
  4. From a pitching POV, IDK what your target audience is, but in work contexts as well as product development, I've seen people be more sensitive to latency than price (at least at current token prices + liberal token budget environment), since the tokens are paid for from a pool, but the latency is felt every time you make a query.

I'm not saying you're wrong, I'd love it if you're right, but I'm not certain if that's the case.