My week with opencode
22 points by lproven
22 points by lproven
One thing that bugged me is that the article seems to conflate tool (opencode) and model. Using frontier models (eg. Opus 4.5) the experience would have been quite different. It's good though to have such detailed notes of the experience, especially as things changes quite quickly in the field.
Seems fair considering that open weight models are hedge against the bubbles of OpenAI and Anthropic bursting if VC funding cannot prop up the business models of OpenAI and Anthropic for some reason. As an outsider I would rather rely on these permissively licensed models than proprietary offerings.
I would rather rely on them, too, but the frontier models are so much better at this task in particular that it seems unfortunate for a skeptic not to have tried a modern frontier model. I do think that much of the observations in this article are good, and directionally correct; but there is an enormous functional difference right now between open weights models and the proprietary ones.
Regardless, why GLM-4.6? GLM-4.7 is a significantly better coding model. I think a lot of the complaints are outdated here.
I’m currently using opencode with devstral-2, and claude code side-by-side to compare them.
Claude code is consistently better, and opencode seems to eat tokens for breakfast, but I still find opencode good enough that I try it out first.
I’ve also had more luck in trying to use the LLMs to search for certain things in the code.
Just today I needed to trigger a exception on our server to check if stacktraces was properly sent to our observability platform, and opencode+devstral gave me a curl instruction to do just that.
Otherwise my experience aligns with the author. There is good stuff here, but it’s not… like…life changing or anything.
It's important not to conflate the harness (claude vs opencode) with the model being used. The latter is where the differentiation is.
Is that the only differantiation, though?
I hear Claude Sonnet did worse in opencode than in the official CLI.
On reddit I see people mention that the same model performs differently across roo, cline, opencode etc.
Note, this blog post has a follow-up, which I posted here:
https://lobste.rs/s/ey9mdc/problem_is_culture
My take is this:
In this one, p1, she evaluates the tools, and finds them wanting. In p2, she instead tries to evaluate the people promoting these tools, and what it is about them that drives them to promote flawed tools with flawed output.
There is insight into the tools in this part, but the more important analysis is that of the people in the following part.
I have been using opencode and I personally like it. sure it has some quirks but it might win the cli agents wars by far