Cutting LLM token Usage by ~80% using REPL driven document analysis

28 points by Yogthos

vhodges

This does seem interesting and is not very different from what a person would do when exploring a code base (eg, grep, TAGS files, lsp, etc). Tools to search and narrow the focus and then load/read the specific files.

I wonder if someone has made an interface between lsps and mcp?

A quick Google and it looks like SourceGraph has done this for their product.

Yogthos

Using lsps specifically for code exploration is actually a great idea. That could work in a complimentary way with the Matryoshka approach too. It could use lsp plugins to explore the code, and then cache the results for the agent to use.

deevus

This looks very interesting and I’ll be trying it today.

Yogthos

Do share what you find, I've been dogfooding it myself for the past few days, but always interesting to see how it works out for others.

thisalex

Is it language-agnostic or does it require customisation for different programming languages?

Yogthos

It's language agnostic.

kingmob

Really interesting! I like the idea of an agent-focused language, especially the idea of caching/storing results, which should speed up tool use.

However, I have a couple questions:

First what's the token overhead of the Matryoshka MCP itself? At least in Claude, MCPs consume a lot of tokens, which could wipe out all savings (and then some).
More broadly, how does this compare to using subagents and ad hoc shell scripts? I frequently use subagents to distill answers and protect the parent's context by only bubbling up the final answer. Custom extractors are interesting, but agents can do similar things already with plain bash, iiuc.

Yogthos

There is no meaningful token overhead of using Matryoshka. MCP tool definitions do have a fixed token cost in the system prompt, but it's amortized across the session. The savings come from per-query efficiency, not zero overhead. For a single small file, the tool definitions might exceed the savings. For repeated queries on large files, the savings compound.

Think of it as a db that Claude can run queries against. The reduction comes from the fact that Claude isn't looking at the documents directly, or even results a lot of the time. What happens is that Claude tells the MCP to load a file, and then runs queries against it, and results are bound to variables. Now Claude just needs to know about a pointer to the data instead of seeing the data and roundtripping it constantly.

Claude can also expand handles when needed (lattice_expand $res1 limit=10), but only does so when actually required for decision-making. It's not blind, but rather has on-demand visibility.

The key advantage here is in having persistent session state in the REPL. Say Claude does a search and finds all the failures, and then sum them up. With this approach Claude doesn't even need to see the actual lines themselves.

Subagents still read files into their context and do full text processing. Lattice keeps data server-side in SQLite, even the subagent equivalent would need to materialize results to reason about them.

With a subagent you're doing File → Agent context → Process → Return summary. With Lattice you do File → SQLite → Query → Handle stub → Expand only what's needed.

Finally, the synthesis/regex features are orthogonal to the token savings from handle-based queries. That's just the internal rules engine Matryoshka implements.
- kingmob
  
  Does Matryoshka ever run into issues with the quality of text selected? This sounds a bit like RAG, and I know the Claude Code team chose not to use RAG over plain search because, surprisingly, the results weren't as good.
  - Yogthos
    
    Matryoshka has nothing to do with RAGs. It's a programming environment the agent uses to make queries directly. This is exactly the same process as when Claude Code runs shell commands.
- aos
  
  Is it possible to work with this without MCP? I'm curious if I can configure this as a skill instead. Maybe start the MCP server individually and then provide the agent with instructions on how to use the CLI to query the server
  - Yogthos
    
    There are a couple of options with lattice-http and lattice-pipe that can be used instead of MCP.