llama-conductor is a router + memory store + RAG harness to force models to behave like predictable components
3 points by Yogthos
3 points by Yogthos
It's RAG and prompt engineering. It cannot force models to behave in certain ways. The author has been spamming Lemmy with their LLM-derived rants about how great the tool is, e.g. here, but I already concluded a few years ago that this fundamentally can't prevent confabulations. Previously, on Lobsters, or previously, on Lobsters, we discussed how confabulation arises from the epistemic constraints which are structurally built into modeling language.
You're right that you can't prevent models from hallucinating, the same way you can't prevent people from doing so. In both cases you're dealing with a stochastic engine that can make mistakes. That said, there are definitely valid techniques for doing that, this paper being a prominent example. The approach they use is to have a quorum, and then only use results that several agents agree on.
Another approach is to create contracts, languages like Haskell already get you most of the way there, where you can encode complex constraints in the type system. The model can't cheat because it has to meet the contract. If the llama-conductor technique helps reduce errors, that helps make the whole system more efficient, since the model can arrive at a correct solution faster.
It is not just that, it is also post-processing to verify that the references do indeed go somewhere (and to bring up the targets of the references).
A post-filter can force the overall toolchain to either fail or give a reference to a numbered allowed non-LLM-sourced claim. Guess-and-verify is older than LLMs, after all.
Whether the picked claim is relevant/useful, and whether the failure rate makes the entire dance pointless, are questions where the answers probably depends on personal preferences and kinds of queries. But you can block outright lies if you accept having a closed list of eligible truths.
Now, the mode with an AI summary of the picked facts… yeah, there is a risk there. But at least translation / summarisation tasks are better grounded than question answering, you can get outright novel inventions to a pretty low rate (and then fully output the original claim nearby). Not to zero, but in my personal experience neither do paid human sworn translations get novel inventions to zero…
Now, would all the annoying limitations to get anything of the promised stuff out of this tool be extremely frustrating to you? Most probably. The author gets frustrated by different things though, so maybe there is match there.