Local Qwen isn't a worse Opus, it's a different tool

16 points by jmillikin


symgryph

I have found that my Strix Halo with the MTP enabled model at a nice, nice six or eight-bit quant gives me about 135 tokens per second on my hardware. I also find that writing a custom MCP to interface with my favorite language, in my case Nim, made quite a difference. Basically, a read loop, a judge loop, and a code loop. The judge is, of course, the compiler, and it seems to write pretty good code relatively quickly.

w

I'm running qwen 3.6 27b locally too, on 2x3090, together with some smaller llms. It's indeed not competitive with the likes of Claude, as you'd expect. However, for privacy sensitive tasks and for usage within other applications, local llms shine. Having MCP plugged into logs, code, other platforms and being able to ask "why is my app currently erroring?", it finding an issue, and opening a focussed MR with a fix is pretty great.