Running local models is good now
35 points by Yogthos
35 points by Yogthos
On the one hand, I don't particularly have much desire to run these tools for my current workflows, but on the other hand the largest of the issues I have with them stem from the centralization aspects (which bleed into other areas too: environmental, privacy, power distribution, etc). So I am glad to see that things really do seem to be getting better on the locally hostable models front.
I strongly suspect that's where things will go in the future. Nobody really wants to send all their data to some service provider, and on top of that, you're completely at their whim in terms of price hikes and model availability. As we just saw with the whole Fable fiasco from Anthropic, there's a real danger in allowing yourself to become a digital serf.
As local models and coding harnesses keep improving there's going to be fewer reasons to rent models from a provider, and I'd argue that's true even if local models are less capable overall. For example, a lot of people use DeepSeek instead of Claude simple because it's good enough while being far cheaper. Same argument applies to local, at some point it doesn't really matter if you can rent a better model when local one does the job.
There could also be a lot of possibilities with customizing and tuning these tools as well. For example, I haven't really seen anybody make a LoRA for a specific language, and that might make the model far more effective in a restricted domain. At that point the model could even perform better than a huge general purpose version.
I'm not sure why people keep saying that, Nobody really wants to send all their data to some service provider, and on top of that, you're completely at their whim in terms of price hikes and model availability..
Isn't everyone doing just that? From billions of people having their everything digital held at FAANG, to companies trusting everything to Microsoft and a few others... It's just us techies and a minor percentage of companies who don't give all our digital stuff away.
When talking about individuals - maybe. But organizations tend to want assurances. That does not necessarily mean they won't outsource, but at the very least an attempt will be made to have some contractual assurances and a plan to change vendors if needed.
(Not to say that organizations act rationally - just that an average organization does more due diligence than an average individual)
The context here is software devs, so it is precisely the demographic that cares about this stuff. Also, I don't think regular people really want to send the data to these companies. It's just that it happens to be convenient and accepted, and the barrier to not doing it makes it not worth it. Like say you're a regular person who has a phone and takes pictures with it. You don't even get a say in the matter. Apple and Google will just upload whatever you snap to their server. To actually have a bit of privacy you'd have to install GrapheneOS which is far beyond what most people can do.
"Regular people" simply don't care. I promise you that if I went to the gym (or wherever) and asked people how concerned they were that their photos were uploaded to Apple photos, the average person would say "not at all."
For the most part even I don't care; my primary concern with giving my data to Google is "what if they suddenly unperson me and cut off access to it." But if I didn't store it in the cloud then I would just have to worry about losing access to it in different ways. (cf the hard drive full of MP3s that sits in a computer that hasn't been booted in like 10 years)
I feel like commoditization is coming for the frontier labs; once the quality of locally-runnable models begins to approach, say, a three month old frontier model, the entire economics of the industry will shift, and fast. Labs will have to have other axes of discrimination than "our model does best on benchmark X" when people fail to see the difference between first and third on the bench.
For sure, and there is a point where models are just good enough for what you're doing so it doesn't really matter that there's a more powerful model you can rent. For example, a lot of people are already switching from Claude to DeepSeek even though Claude is better in absolute terms. We're just getting to the point now where local models are starting to become a viable option, and I expect that within a year or so we'll have local models that are good enough for most tasks.
That might be what pops the whole AI bubble. The models themselves are basically general purpose commodity, traditionally the margins on making these kinds of general tools are low. The value lives in customization and finding a niche domain. What's likely to happen is that we'll see companies popping up specializing in tuning on prem models for businesses to fit their specific needs. And the whole market for renting out general purpose models will collapse.
I think about Cohere or Mistral here — there might be a niche for “less capable model but with a more amenable governance structure” even setting aside local models, right?
For sure, I'd really love to see models developed in a community governed fashion as actual open source projects rather than just being graciously handed down to us by corps.
once the quality of locally-runnable models begins to approach, say, a three month old frontier model, the entire economics of the industry will shift, and fast.
This seems unlikely as stated; so long as throwing more CPU/RAM at an LLM makes them more effective there will be a gap between what can run on a $500 laptop and a $500,000 rack-mounted server stuffed full of custom silicon.
What does seem possible is that in 5 years a local model could be 3 years behind the frontier, which would mean it's 2 years ahead of where the frontier is right now.
I mean this is a good point! But ISTM that there is a threshold and not thinking about that is a mistake when we look at what’s been good enough in other technical domains.
Local models are different in interesting ways, some of which may be an advantage:
Fable is the model that really convinced me we're screwed. It really can just crap out entire projects. The "McMansions" even look nice. But the roofs leak, the foundations are shaky, and the craft is just good enough, just long enough, to sell. This will, of course, likely be wildly successful in the market. I mean, even Fable's worst day is still better than plenty of enterprise SaaS (except for, you know, compliance and security).
So while I find that local models are interesting tools, I am really not looking forward to the messes created by the next generation of frontier models.
Also, as long as the pre-answer-generation blabbering does not switch to neuralese, local models will unavoidably let it be inspected in real time, while I have heard that frontier models now hide it as an anti-distillation measure. This sometimes reveals interesting «confidence/doubt» information, and sometimes let me cancel the request and write a better version based on how the model rewords my question.
If there are academics here, what do you use local models for? I have found qwen3-coder:30b reasonable for latex edit, and for querying OCRed papers about their results. Any other usage?
First drafts of translations. Proofreading those translations helped fix quite a few mistakes in the teaching materials we could have noticed without translating but never did… (This is mostly relevant for teaching in an environment that is not single-language-only)
One-shotting first drafts of general quality-of-life small personalised scripts/mini-tools. Including a harness for the translation to exclude e.g. TikZ from translation requests. Needs debugging afterwards, debugging much more interesting than writing the slog part that slop does get right. Validation strategy obviously matters even more than for handwritten things, ideally it is «any remaining bugs will be pretty obvious when running the tool»…
Honestly, Qwen3.6 surprised me by being not that bad in drafting example solutions to rather standard-ish proof-writing exercises. Although editing to match the desired style might make this somewhat axe-porridge-ish/stone-soup-ish, but some formulas probably stay through the process… depends on tediousness, I guess.
Academic here. I don't use "agentic coding"; I don't use LLMs for writing at all (it's even forbidden by most editors isn't it?). I have been extremely underwhelmed every time I tried, not to mention the hassle and fragility of setting up a local inference pipeline (it requires using our shared computing cluster, my laptop's GPU is very tiny). I do use ollama/qwen3-coder or duck.ai occasionnally when I don't have the right keywords to search how to do something in a language or with a lib I am not very used to using, or for very specific stuff I am not an expert at all in (regex, SQL, ...).
Proof reading that goes beyond spell-check/grammar-check basically. Or writing quick scripts for data analysis, but only pilot experiment type stuff, not final analysis, so exploration.
Reformed academic, run in fairly academic circles.
Don't see a lot of local LLM usage, outside of ML people, or schools who for whatever reason provide an endpoint on local clusters.
Frontier model usage is incredibly high. Mathematicians are using them extensively, from small lemmas to typesetting/picture generation/one-off code projects. What was once a undergraduate summer project is now a prompt away, everyone seems to recognize it's bad for training but the temptation for instant gratification is just too strong.
An important point to note is the only real angle for local LLMs is privacy/ethics, since academics are universally putting these tools on grants (though unclear how tedious this is to drag through European bureaucracy). My academic friends in math/physics are largely unconcerned about digital privacy & ethics, despite being some of the earliest computer adopters (digital "natives" in the 80s-90s). Exception there might be French friends who have an independent culture of open source/digital sovereignty.