Richard Feldman on new language adoption in the LLM age
17 points by jerodsanto
17 points by jerodsanto
Hm interesting!
On Zig being well represented in Claude - I haven’t tried it, but yeah the models get better, so it doesn’t surprise me if it’s good now, and if it gets better.
I saw someone say “LLMs aren’t good at Rust”, but I’d expect that to get better too.
On porting code from one language to another – There’s an interesting tradeoff between reuse and duplication.
which is mentioned here too: https://crawshaw.io/blog/programming-with-llms
There was a programming movement some 25 years ago focused around the principle “don’t repeat yourself.” As is so often the case with short snappy principles taught to undergrads, it got taken too far.
….
The past 10-15 years has seen a far more tempered approach to writing code, with many programmers understanding it is better to reimplement a concept if the cost of sharing the implementation is higher than the cost of implementing and maintaining separate code.
In other words, LLMs make bespoke code cheap. I think this is right, but it will ALSO be taken too far!
As with most things, I view it as a combinatorial / scaling problem. The cost of ~100K or ~1M lines of LLM-translated code in an application is not zero … in terms of the humans who have to work with it
A future where say the Go, Zig, Rust, Swift, Roc ecosystems are constantly LLM-porting each other’s code back and forth seems a little silly. We’re going to take it too far, and probably swing back back to plain-old API design / language design / library reuse, which are hard things …
On the larger question about new programming languages …
One way to view programming language design is as a form of compression … [1] You’re trying to express some class of programs in a syntax and semantics that is easy for both a human and computer to understand
And I think that is going to survive, for fundamental reasons. LLMs don’t change the “human” part – if anything, they amplify its importance!
A thought experiment: Do you expect LLMs to generate raw assembly code, making programming languages obsolete?
I claim that if you do, then you have misunderstood what LLMs are … they are language wizards, and they LIKE using different programming languages.
So my intuition is that these three things are basically the same:
So if you believe that, then programming language design is still valuable, and there will be future progress in the field. (I also suspect that LLMs will “like” to develop their OWN DSLs / languages to solve problems, for the same fundamental reason of compression)
I’m not sure if anyone has written about this, but if so, I’d like to see it ! (or an argument against it)
[1] Similar to LLMs themselves being a form of compression: https://bellard.org/ts_zip/
FWIW, https://oils.pub/ is a bit hedged – LLMs are extremely good at writing OSH, because it’s the most bash-compatible shell - https://pages.oils.pub/spec-compat/2025-06-26/renamed-tmp/spec/compat/TOP.html
On the other hand, you can say that “YSH is for humans” … it has more power than bash, but it’s smaller and easier to remember (or will be, once we finish disallowing the bash features that have been generalized / replaced)
I just got word that one of our contributors replaced a ~2500 line bash script with a ~2000 line YSH script at work, with HUMAN coworkers who didn’t know YSH beforehand. So that is a good sign!
I saw someone say “LLMs aren’t good at Rust”, but I’d expect that to get better too.
This is not my experience. FWIW, though I see people saying similar.
Since I started being a regular Cursor user, I’ve only worked on one Rust program regularly, and it’s embedded Rust using Embassy which is an even smaller niche. In my case I had to turn off autocomplete because it was just making up irrelevant nonsense (whereas in TypeScript CRUD code it’s like a mind reader). However, when given a small feature task, it did surprisingly well.
What was funny was that it’s just as bad at fixing borrow checker issues as I used to be, and goes in the same circles based on the same misconceptions I used to have! But when this happens, I can step in and tell it the right approach, and then it typically does OK. Like earlier me, it knows the syntax but it has trouble arranging the data structures the way the reference semantics need them to be. And then it makes the mistake of putting too much credence in the compiler error hints, like I used to.
E.g., the thing where the compiler tells you you’ve “held a reference across a loop iteration” when what it means is “the compiler is overestimating the lifetime of your reference because you have a branch in your loop”. That confuses Claude just as much as me.
I did not have a good time with Cursor at all, personally.
Can you expand on why?
I tried it’s agenetic features, and it struggled to even set up a new project for me. Even though it was using Claude sonnet 3.7, it was much worse than the same with Claude code, even with identical prompting.
Yeah maybe I shouldn’t have said that, since ironically the first time I used an LLM to successfully write any code was for Rust! In March 2024
That was precisely because I don’t know Rust, and it could fill in the equivalent of int main()
, traversing directories, etc. for me.
It works well for demo code, though I still haven’t generated “production” code with LLMs (mainly because design is the bottleneck, not actually writing code)
I have also heard people say similar, and I also have had a very different experience.
I wonder if it’s an outdated perception that used to be more true in the past—or are people using LLMs in vastly different ways? Personally, I’ve had a lot of success with LLMs as brainstorming buddies: sometimes they hallucinate the Rust API, but it’s great for higher level questions about types and lifetimes, and for coming up with alternate approaches that might work better with Rust’s type system. Once I have the eureka moment, I go off and write the solution largely on my own.
Contrast with “true” vibecoding, where the LLM writes all the code. Disclaimer: I’ve never done that in Rust before. But maybe this particular use case trips up LLMs? Maybe writing in Rust requires more in-depth thinking ahead (fine details about types have implications for the overall project), and so an LLM that writes an outline first, then tries to individually solve each step, is going to have a hard time? Or maybe Rust library methods are more likely to be halluncinated, and that slows down an LLM that’s writing the code itself?
Again, I’ve never vibecoded Rust, so my guesses could be completely wrong. But something has to explain why people are reporting such different experiences.
My experience prior to Claude 4 was that AI agents in particular would be totally fine at writing syntactically valid Rust, but they would get stuck on borrow checker errors on a regular basis (as in, get into a state where they give up and deleted everything they’d attempted to write and replaced it with “// A real implementation would …”), but since Claude 4 that failure mode has pretty much disappeared in my experience.
So at least in the Zed code base (which, granted, is over 600K LoC of Rust), “LLMs are good at writing Rust” has only been true in my experience since Claude 4 came out.
Thanks for sharing your experience!
It’s curious that LLMs seem to have a decent conceptual understanding of lifetimes (at least in my experience), but that the older LLMs fail to transfer that understanding to code. It’s like working with the concepts in English exercises a completely different part of the model.
I have basically given up on the idea of figuring out why people have divergent experiences. I’ve never found anything that explains it.
I’ve done both short and longer tasks, it just hasn’t been an issue.
People have divergent experiences when it comes to basically everything. I don’t see why LLMs would be any different! I can’t think of a single concept in software engineering that is well agreed upon by all, you’re always going to be able to find a counter example or someone who has a reason to disagree.
So, I think that giving up on the idea of figuring out why is probably the best approach. It doesn’t really matter why, after all. There are likely things that we can learn from those different experiences though, so we probably shouldn’t lose sight of the fact that they exist (and are valid!).
I feel like it’s different this time. I can more easily explain different experiences by things like preferences, or tradeoffs, but with LLMs, it seems all over the map, all the time.
The difference lies in the nature of LLMs as tools. They’re nondeterministic, fuzzy, with a near infinity of applications that other tools can’t come close to touching in breadth. A miniscule difference in perception can manifest in a massive difference in someone’s evaluation of the tools in general.
That’s one of the key hallmarks of LLMs though. They pick up on subtle cues in your question that you might not even realize you’re giving, and can easily end up simulating an emotional or knee-jerk response to your question if that’s the most plausible kind of response that human would have to your question, for example because it betrays subtle signs of ignorance.
I doubt you are betraying subtle signs of ignorance, so you are more likely able to get the answers you want because the machine is not imitating human irritation with your ignorance.
I have a very rigidly patterned codebase I wrote in Rust. Claude does Adequately Well for writing new bits and pieces.
I would not do it in an untyped language.
I partially agree. There’s a bit of a problem that comes from the Rust ecosystem being so quick at improving things, which leads to LLM always being a bit off from what is the current state of things.
For pure Rust code LLMs are pretty solid, though. Often much better than what my amateur self can write.
Also, on compression:
Then the algorithm for language design becomes: look at a program and ask, is there any way to write this that’s shorter?
https://paulgraham.com/hundred.html
(This 2003 essay was cited by both Larry Wall as inspiration for Perl 6, and Rich Hickey as inspiration for Clojure)
If I look at most “real programs”, I think they are too long, simply due to lack of collective understanding. New contributors often only know how to add features in a suboptimal way …
As a rule, the quality of most software projects falls when the original authors leave – i.e. when the brains are changed :-)
As a rule, the quality of most software projects falls when the original authors leave
I agree, every generation of programmers passing through a project creates new grain boundaries, so to speak, but if a new generation stays long enough, the codebase can also heal slowly over time as the new brains get a chance to refactor the old code bit by bit and expand their grain.
The worst is when two radically different mindsets are forced to work on the same codebase simultaneously without reconciliation, then the grains look more like a fractal.
Is Claude better at Zig than other LLMs? From my admittedly very limited testing they’re all trained on older syntax and language features. Stuff like the old Cast Inference (changed in v11.0 2023-08) that won’t compile anymore.
It helps to say explicitly “use Zig 14.0” but it’s still a pain trying to distinguish fake code from code that can be modernised.
Ironically, the LLMs seem to do best on brand new languages with the entire training corpus coming from the core dev team of the language itself. Get an LLM to spit out C++ code and it’ll be a mix of 1980s/1990s tragecomedy code and modern C++ and everything in-between.