Local Qwen isn't a worse Opus, it's a different tool
16 points by jmillikin
16 points by jmillikin
I have found that my Strix Halo with the MTP enabled model at a nice, nice six or eight-bit quant gives me about 135 tokens per second on my hardware. I also find that writing a custom MCP to interface with my favorite language, in my case Nim, made quite a difference. Basically, a read loop, a judge loop, and a code loop. The judge is, of course, the compiler, and it seems to write pretty good code relatively quickly.
I'm running qwen 3.6 27b locally too, on 2x3090, together with some smaller llms. It's indeed not competitive with the likes of Claude, as you'd expect. However, for privacy sensitive tasks and for usage within other applications, local llms shine. Having MCP plugged into logs, code, other platforms and being able to ask "why is my app currently erroring?", it finding an issue, and opening a focussed MR with a fix is pretty great.
I'm curious what's the set of specific MCPs that you consider worthwhile. Given the vibecoding explosion era, personal recommendations with even a slightest hint of trustworthiness are becoming even more valuable and hard to find than before.