Find bugs in YOUR code using OpenCode, Llama.cpp and Qwen3.6

28 points by vbernat

vbernat

I know many at Lobsters don't like this topic. In this case, the models are run locally and is used to find bugs. I think it shows how to use LLMs more responsibly.

addison

There are still concerns about e.g. the origin of training data and the cost of training, but yes, this sidesteps many of the issues raised. Thanks for sharing.
AlbertoGP

Thank you for posting this. I do use haproxy and Willy Tarreau has earned my respect over all these years.

Since a few weeks ago I have started doing some very limited use of LLMs for programming in this way, looking for defects in my own programs, and I am pleased with the result. Used like this, LLMs are just another static analyzer.

Each static analyzer such as gcc -fanalyzer or clang --analyze picks up different things, and has different false positives. I always have to scrutinize their reports anyway. The LLMs found a handful of actual issues in total over several programs that had escaped the other analyzers and my own tests and reviews. While fixing those manually, I cleaned up some small things, and clarified the code.

I also learned some things I did not know about the C standard library. Those were unknown-unknowns, had I thought that there could be a problem I would have found information about it, but it never occurred to me to look there. The LLM pointed out the connection and then it was easy to find the relevant documentation.

bitshift

However [Qwen3.6 MoE] was able to spot many bugs in HAProxy and to propose mostly valid patches.

For a moment I was thinking, "How does he know they're real bugs? Is this like all those folks who reported Curl vulnerabilities?" I didn't realize the author of the article is also the author of HAProxy.

erock

I have never had an issue with an LLM/harness running without a sandbox. They seem pretty good and not doing anything dangerous — at least in my experience.

Qwen3.6 27b is indeed awesome and replaces most of my agent needs. I love the idea that an LLM designed for code is able to run on consumer hardware. It really is a terminal enthusiast and self hosters dream. A huge swath of my needs for a search engine it can answer without consulting the internet.

kornel

They seem pretty good and not doing anything dangerous

The autoregressive fuzzy logic works until it doesn't.

I've seen a model say "this won't work, let me start over", try to git reset --hard which wasn't allowed by the config. It confused the model so much that it adopted a new goal of working around it and started inventing novel ways to delete everything, e.g. tried moving files to target and run cargo clean.

opencode's permission error shows permissions config to the agent, so models smarter than a roomba learn what they can do without being stopped. I've blocked git log, so the model noticed that git rev-list wasn't blocked and casually used that instead.

opencode's security is symbolic. A coding agent can run arbitrary code one way or another.
- quasi_qua_quasi
  
  The mental model I have is that you have OS-level enforced sandboxing (microVMs, containers, bwrap, whatever), and those are like safety bollards. Then you layer your harness-level security mechanism on top, which is like painting the bollard bright orange and putting a NO ENTRY sign on it. Only the bollard is an actual mechanism; everything else is just an affordable.
k749gtnc9l3w

They seem pretty good and not doing anything dangerous — at least in my experience.

That might be what some people in speculative trading seem to call as «picking up pennies in front of a steamroller».

You save a minor inconvenience by not using any hard blast radius limiters at all (sure, hardness is relative, 0-days are everywhere, but they take time to find and weaponise), but the risk of the model misclassifying a «this can be easily fixed by rm -rf /home/» joke in the training data is probably a steamroller you don't enjoy to roll over you even once per five years.