rsync and outrage
126 points by projectgus
126 points by projectgus
"This is all a huge amount of work. I’m retired (though my wife may dispute that!) and I’d rather be >out sailing than working on rsync security issues[..]"
To me it reads as the author feels compelled / pressured in maintaining his project although he'd rather go sailing and saw a possible solution in using LLM's to be able to do both.
It is perfectly fine to enjoy retirement and sailing instead of fixing bugs. It's also perfectly fine to not fix any bugs in your opensource project (but be open & transparent about this!). As the saying used to be: patches welcome. Especially If your project is being used by companies with plenty of resources to contribute and are depending on it in some way or another.
I'd rather see more maintainers & developers enjoy retirement & go sailing without feeling pressured to resort to LLM's to "help" them in maintaining open source software. Still, even if he wants to explore LLM's in the rsync project then that's his choice (afaik he is the sole maintainer?). Even if others disagree (including me).
People harassing opensource software developers for whatever reasons seem to forget that free opensource software is not a product. It's a gift.
But you know what’s also fine? When people use their time to maintain open source software. Good open source software in this case even.
The peanut gallery can decide to fork it if they disagree with them, but Andrew can do and how he wants on that project and does not need our commentary and opinions.
Off course, it's fine if people like to use their time to maintain opensource software. The more the merrier :-) I'm not sure what you mean with "peanut gallery"?
"Peanut gallery" originally referred to the cheapest seating section in a vaudeville show (c. 1870s), because the similarly cheapest snacks, which would sometimes be thrown at performers by hecklers, were peanuts. That section was generally thought to be rowdier than the rest of the theater, which lead to American cultural idioms such as "comments from the peanut gallery", meaning comments from uninformed and/or impolite people whose opinions you can discard. The term was adopted by the Howdy Doody radio show, which had a live studio audience of children, leading to its re-popularization in the 20th century.
It is also a term with a significant racial character; in many theaters, the peanut gallery referred to the segregated section reserved for non-white attendees. See, for instance, its first recorded print appearance in 1867. This interacts poorly, of course, with the stereotype that it was full of mean, uncultured, and rowdy attendees. For that reason, many, especially in the south, consider it quite offensive. I presume Mr. Ronacher was unaware of this.
For that reason, many, especially in the south, consider it quite offensive. I presume Mr. Ronacher was unaware of this.
I've spent a lot of time in the southeastern part of the US, I'm a native US English speaker, and I've heard the term "peanut gallery" used since the early 1980s. I've always understood it to be a reference to the Howdy Doody show.
Your comment is the first I've heard of any racial component. Though you didn't cite sources for most of your comment (cool link to the first recorded appearance, even if it's paywalled!) I don't doubt that it is considered racially charged in certain contexts. This reasonably cited article backs you up. I do believe that most speakers who use it are referencing the live audience of children on various iterations of Howdy Doody during the mid-20th century. Which is, to be sure, an attempt to cause a very different kind of offense. But almost certainly not intended as a racial slur in most cases.
Yeah, sorry for the paywall. newspapers.com is a great source if you're ever looking for etymological information about words that came about post-1850 or so; I recommend getting yourself access through your local public library if you can.
It is also a term with a significant racial character;
I've been alive in these here United States for 50 years. I have more than a passing interest in the English language and idiom and a strong interest in the history of racial injustice in this country. I have never heard anyone blanche at this phrase. One of the zillion little nuisances on the Internet is when people bring up now-obscure racial aspects of something in bad faith. I assume you were unaware of this.
It all reminds me of Metafilter tearing itself apart for a week about Bugs Bunny's use of "maroon" as a noun. It also seems to me, on the basis of 0 evidence, of a renaming of the concept of a "groundling", more classicist than racist. Regardless, it survived past Vaudeville thanks to Howdy Doody and those were all kids, so ...
One of the zillion little nuisances on the Internet is when people bring up now-obscure racial aspects of something in bad faith.
I am from the American Southeast and have heard people use this term derogatorily towards Black people. I didn't bring this up in bad faith; I brought it up because it feels strange to me to convey the term's meaning without also conveying that it might piss some people off. (At least, I don't like it when people do that to me! It's lead to more than one faux pas on my part, though thankfully no more than that.)
I assume you were unaware of this.
Is this sarcasm? I didn't intend sarcasm in my post, but reading this in context, I fear I came across that way.
It also seems to me, on the basis of 0 evidence, of a renaming of the concept of a "groundling"
When my eighth grade English teacher referred to a group of us who were making snide comments during her class (I'll age myself: this happened while she was teaching us to diagram sentences... circa 1990) as "the peanut gallery" she explained it to us exactly that way. She talked about the Howdy Doody show, groundlings during Shakespeare's era, and how that was the modern American version of "groundling." She supplied 0 evidence as well, but your comment made me remember her explanation, and I chuckled.
Your comment really irks me. I interpret it as a bad-faith escalation attempt of an ordinary idiom.
“Peanut gallery” means hecklers or unsolicited commentary from the cheap seats. That’s both the normal contemporary usage, and it is the neutral account given by Wikipedia as well. The fact that some historical theaters had segregated seating does not turn the use of the phrase into a racial reference.
Nothing in my post had anything to do with race. Pulling race into this reads less like useful historical context and more like an attempt of veiled defamation from where I sit.
That was entirely unnecessary.
(I also find it quite frankly bizarre that you chose to address me with my surname here)
Apologies. I really didn't intend to cause a problem or escalation. I can't edit my post; would you like me to delete it?
To me it reads as the author feels compelled / pressured in maintaining his project although he'd rather go sailing and saw a possible solution in using LLM's to be able to do both.
I interpreted it as the author bidding for empathy, reminding the audience that he is a person with competing needs, and that security issues in particular incur a large burden on OSS maintainers: they're high pressure, high visibility, and often away from the core of the project that might be what's motivating for them.
Maintainership has many responsibilities, and not all are enjoyable, but I feel grateful when FOSS maintainers are willing to do those. Few seem to.
FWIW, Tridge returned to rsync after the previous maintainer had sort of burned out. So you're not off-base. The silver lining might be that someone else steps up to maintain rsync long term. Tridge has already turned it over to a new maintainer before.
I attended a talk and wrote it up for LWN (publishing today) about some case studies with the OpenJS Foundation stepping in and helping some projects with funding and transition from single maintainer to team maintenance. We need so much more of that for things like rsync.
Thanks for sharing, I didn't know that Tridge had turned it over & now was getting back into it.
Good to hear that foundations like OpenJS are stepping in and lending a hand, we do need more of this indeed.
I'm curious if, and what the effect of the European Cyber Resilience Act will have on supporting developers & maintainers of open source software by companies currently using opensource software. I'm hoping it might "nudge" them into taking a more active role in the maintenance and support of open source software like rsync.
I'd rather see more maintainers & developers enjoy retirement & go sailing without feeling pressured to resort to LLM's to "help" them in maintaining open source software.
This seems to imply that "resorting to LLMs" ends up negatively affecting the end product, which need not be the case.
It also makes complete sense that the cost of achieving a specific goal could be deemed too high without LLM help, while it could be deemed reasonable with LLM help.
The positive spin on this is: an open-source maintainer that wants to keep a healthy work-life balance is now more easily able to do so as LLMs can help reduce the workload, while still achieving the desired goals.
this blew up precisely because it was the case, though.
Do we know that? I haven't seen any discussion on the causes, link?
I'd be curious what is the base rate for regressions in rsync releases? Is it known for not having this scope of regression?
Even if it was established to be the cause, I'd also be curious on whether the security patch timeline would have been delayed if they hadn't used an LLM.
Also, the author mentions some of the changes were to improve the security hardening of rsync, so to really evaluate the end result on the product, we'd have to evaluate which vulnerabilities this prevented from ever becoming CVEs.
Finally, to discount LLMs, we'd have to show that the above held for most projects that use LLMs, rather than just one.
This all sounds like a lot of work, but I think that's because there's a large gap between rsync having a regression and concluding software is better off without LLMs.
I don't think that's a fair assessment -- it conflates why this got attention with what caused the bug.
It blew up because there's a "Co-Authored-By: Claude" line on a CVE fix, not because anyone has shown the defect was LLM-specific or worse than a human's would have been.
A technical breakdown of the regression:
The defect is a subtle, domain-specific path-handling error that an "expert" C programmer could very realistically produce.
If the "Claude" coauthor field was omitted, this would have been just another unremarkable human mistake that nobody would write a blog post about.
It is totally reasonable that something like this could happen, LLM involvement or not. Perhaps we would have had three regressions if the CVEs were fixed entirely by hand. Perhaps we would have had zero.
Point being: singling out LLM usage as the cause of this one isn't fair.
To me it reads as the author feels compelled / pressured in maintaining his project although he'd rather go sailing and saw a possible solution in using LLM's to be able to do both.
To me this reads as you already having your conclusion ready before reading the rest of the post. What you cherry picked to reply on very much is part of an introduction. This article, which is clearly written with consideration and nuance in mind isn't about basic OSS maintenance at all.
What I find even more odd is that you are leaving the context before what you quote
As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues (so I find at least some of them before other people!) and the addition of a whole lot of defence-in-depth hardening techniques. This is all a huge amount of work.
In this case the author is retired. In the case of other OSS projects people will have a day job or something else and find themselves flooded with a similar increase in work.
To be honest, I am baffled that people are up voting your comment so happily as it almost feels like you didn't write it in good faith. At the very least it feels a bit like lazy drive by commenting to me barely better than just responding to the headline.
I'd kindly ask you to retract your statement about not writing in good faith. This is simply not true. Accusing someone of this without any evidence, is not okay and prevents us from having a constructive conversation. We may disagree in the interpretation of the blog post and that's fine. We both build upon assumptions. The original author is the only one who could shed more light on it.
It would behoove you to have a bit more faith in the good intentions of people engaging on Lobste.rs.
I'd kindly ask you to retract your statement about not writing in good faith.
I am not stating you wrote it in bad faith, I am describing that it almost feels like it. This in order to highlight how far off I find your conclusion to be from the content and context of the article. I also followed it up by the fact that at the very least it feels like a drive by comment. To be frank, this is a good example of your interpretation of the article as well. Reading half a sentence and seemingly jumping to a conclusion without internalizing what it really says.
If I were to retract that observation I feel like your top level comment also deserves as similar retraction in what picture you are painting about the article author.
I won't excuse or approve of the harassment. However, I can't ignore that there's a missing reason in this defense. The given explanation is that the author wrote the design for the vibecode, reviewed the vibecode, is good at code, is good at chatbots, is cautious with vibecode, and tried to balance security against feature regression; that all sounds plausible. Nonetheless, there were regressions, and the author never actually reaches the reason for them.
I used AI tools to do the grunt work because they are good at that.
Nope. In fact, the chatbots are bad at writing. This is the central issue and the author cannot even perceive it.
Nonetheless, there were regressions,
It'd be interesting if someone actually did a timechart of regressions after each release (if at all possible) to see if the number actually went up recently or not. Because regressions are not unheard of and I suspect people are just looking for an excuse to pile on Andrew.
I would also love to see such a chart. It wouldn't be completely informative: not all regressions are the same, and we might not have enough data yet to show a statistically meaningful difference. But at least it would be something objective we could measure.
It is harder to get good data on the density of urgent bugfixes. Which is a pity, because it would be useful to evaluate urgency as predictive or not of unfortunate downstream effects.
Why is the problem coding agents? Why isn't the problem a missing / underspecified testsuite? Or even a maintainer whose understanding of their codebase has atrophied more than they thought?
It isn't purely coding agents, but if the first major LLM-involving release is also the first regression, in a software that – I would say – is usually considered very stable (especially since it runs headless in a lot of places), there's at least a correlation. A correlation that can potentially be explained causally with the drawbacks LLM use has or is suspected of having.
My takeaway from the post was that the trigger for the increased volume of changes (and hence increased number of regressions) was the influx of (mostly) LLM-enabled security issues.
i.e. the causal chain was: LLMs -> more known security issues -> more changes needed than usual -> more regressions than usual.
which may be sufficient to explain the correlation you observe (but equally, your explanation is also possible).
No matter the correct explanation: I'm in no position to answer this question for anyone but myself, and to be honest I don't know any details about the security issues (especially exploitability), but in the scenario I use rsync the most in (backups over ssh), I would prefer a stable upgrade path to the quickest security fixes.
My fear is that LLM use, simply by the nature of the interaction with the agents, leads to the opposite ("just one more fix crammed into this release, bro").
My fear is that LLM use, simply by the nature of the interaction with the agents, leads to the opposite ("just one more fix crammed into this release, bro").
The 3.4.3 and 3.4.2 changelogs seems to show primarily security and bug fixes (and the afore-mentioned test suite restructuring)?
https://github.com/RsyncProject/rsync/releases
I think I understand your fear and I was initially very demoralised by the report (and associated "AI pilled" / "AI psychosis" narratives) but I find this post by tridge to be quite encouraging.
I get that some people don't want anything to do with LLM-produced output, for a variety of reasons. But I'm quite keen to be precise on this, as it is helping me navigate my own understanding of this important/complex/divisive development.
Yes, the post gives understandable motivation for the actions taken. I still weep for every quality-minded developer accepting the outsourcing of their brainpower into billionaires hands. I understand many of the pros discussed, but not one proponent of LLM use has managed to present me with reasons for how that leads to a world I would rather live in.
I'm starting to believe that fewer people should re-read ZAMM but rather look for a modern version of the CIA Simple Sabotage Field Manual.
Keep in mind that these are a big-ish releases anyway, with a new test framework and a high number of security fixes. It's not that surprising that some regressions slipped in, whether LLMs were used or not. IOW: even if LLM commits should not be trusted like human ones, correlation isn't causation.
Whether we like it or not, LLMs are exposing a lot of vulnerabilities right now, which maintainers have to deal with. Rsync's maintainers have a lot of work on their hands, whether you consider it "stable" or not.
Yes, sure. I'm generally annoyed to see so much FOSS adopting LLM use and, from my perspective, having nothing positive to show for it. It's making me cranky.
Well, it’s not pretty or fun but fixing vulnerabilities is progress. The vulnerability was a pre-existing bug however you look at it.
Yes, but the timetable was self-chosen. Just as a hypothetical, if the regressions could be pinned to LLM use, it would be interesting to see if the time saved by the generated code is more than the time wasted by all the people having their systems partially stop working.
That's the thing: without a reason to believe that reviewing a patch from first-time contributor has better effort/risk curve than for a patch written by an LLM, the maintainer cannot get anything at all out of the latter pool of time… and not burning out is more or less about ignoring one-sided moral arguments.
Or even: that commiting any change to a codebase incurs risks, and in an age where OSS projects are getting drowned in accurate security reports (generated by LLMs!), what is the option other than to increase velocity?
If there's one or two bug reports, you can fix them individually.
If there's too many to count them, the only option is to slow down and carefully and methodically solve the problem holistically.
The worst thing you could do in such a situation is throwing an LLM at it, because they will do lots of individual fixes, with countless unrelated changes, that cause lots of unpredictable regressions.
I feel like this was the clearest lesson from the Mozilla changes vis-a-vis Mythos-informed security reports. They made a lot of noise about the number of issues identified and the number of bugs fixed. If you dug into them it was pretty clear that many of the bugs were specific variants of a class that was amenable to a single fix.
This does not match my understanding of what happened here, he did methodically approach it and didn't just mindlessly throw an LLM at solving as much as possible at once.
He rewrote the test suite and improved CI, then tackled the vulnerabilities one by one. One of those fixes led to a regression, that made a lot of people angry, and that's where we are now.
This is pretty much the case. Two of the "regressions" reported are to do with 5+yr old LTS releases and an Android system which was never officially tested. This really is a thing I'd expect to see in a project you come back to after a long time. (Also IMO, if you're stuck on a 5+yr old LTS and critically need those fixes, you should be paying someone to do them)
On the other hand people were complaining about improving the test suite with LLMs. Someone's got to do it.
The test suite is always going to be underspecified. Traditionally we had a hedge against that: that someone actually wrote and understood the code and was more likely to spot issues due to their familiarity. Now that people are offloading that part, the blame can't fall on the tests. Tests have never been adequate for quality control by themselves.
Totally. Reading about the Swiss Cheese Model freed me from thinking about root causes. That model applies here.
Nonetheless, there were regressions, and the author never actually reaches the reason for them.
Sorry, but did we read the same article?
yes, there were regressions in some use cases of rsync in the 3.4.3 release. I quite deliberately tried to err on the side of fixing security issues for that release, and there were some valid (but unusual) use cases that got caught up in the changes. None of those cases were covered by the existing rsync test suite or by all the manual testing I did (yes, I use rsync, I don’t just develop it). I am working through those regressions, and I appreciate all the people who have reported them on the github repo as issues or PRs. I do read them even if I don’t respond quickly to all of them. I apologise if your use case of rsync was hit by these regressions. If you don’t mind the security risk then you can of course use an older release.
Like I said, the author "tried to balance security against feature regression." I don't dispute that he tried. I merely dispute that the chatbots are good at writing code; in fact, they are bad at writing code. If the author had approached these security bugs by hand with a mental model (a Naur theory!) which preserves their desired features and functionality then they would have caused fewer regressions; the author purports to be good with code and I don't have any evidence against their claim.
If you're struggling with this then I suggest close reading of the paragraph which you quoted. Sentence one: "yes, there were regressions." Sentence two: they tried to avoid regressions. Sentence three: the regressions were not covered by existing tests. Sentence four: the author is fixing regressions. Sentence five: the author is reading every bug report. Sentence six: the author apologizes for any inconvenience. Sentence seven: you, the user, can choose to use an older release with the standard caveats. Which of those sentences gives the reason for the regressions?
Which of those sentences gives the reason for the regressions?
I would argue there is not one reason for the regressions, and it is a category mistake to look for such a thing.
As Tridge said:
I quite deliberately tried to err on the side of fixing security issues for that release, and there were some valid (but unusual) use cases that got caught up in the changes.
This is a reason for the regressions. Specifically, "err on the side of fixing security issues" -- the priority was fixing the CVE, at the deliberate (potential) expense of correctness. This is a tradeoff I think it is reasonable to make, though other choices could also be reasonable.
Tridge also said:
None of those cases were covered by the existing rsync test suite or by all the manual testing I did
This is another reason for the regressions. The tests did not cover the feature that regressed. The reasons for this go back decades, and could be an interesting chapter in an after-action review.
More from Tridge:
As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues
This is another reason for the regressions: there was a response to external influences, namely the "flood" of new security issues. This changed the dynamics of the system*, and changed the focus of the work.
(* system = sociotechnical system producing code artifacts)
And of course, there's the trailer in the commit:
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
and Tridge's comments about that, though in a paragraph really dedicated to the test suite:
I used AI tools to do the grunt work because they are good at that. I reviewed every part of it myself and ran through a huge amount of CI time getting it right [...] What you see in the commit history with co-authored by claude is the tip of the proverbial software engineering iceberg.
So this may indeed have contributed to the regressions. We don't know, because we don't know what process Tridge used for the relevant commit, or how that process compared to the process he would otherwise have used.
Certainly what we have from Tridge indicates an awareness of the slop risk of LLM use, and of specific actions taken to mitigate that risk (writing the design first, including a validation plan, reviewing the code, running tests).
In general we should be looking not at the individual (or tool) that was the proximate cause, but the aspects of the system that produced the error. The technical failure of missing a subtle bug when fixing a CVE is the least important piece of it. That is the result of the pressure, the lack of other maintainers, the lack of tests, the deliberate decisions to prioritize security over correctness. Those are known to be contributors, and would be contributors whether or not an LLM was used.
And yes, poor quality tool output is a possible contributor to the regression, but that must be viewed in the context of the whole system. Would Tridge be using an LLM if there were more maintainers? If the security bugs were coming in slower? Would the risk of lower quality code be better mitigated by better tests? Is the risk of the LLM producing poor quality output offset by the benefit of writing a more comprehensive and better test suite? Of getting CVEs fixed faster?
Perhaps if Tridge had written those 21 lines by hand he wouldn't have missed the bug. We can't really know a counterfactual. I know that plenty of CVE fixes have introduced regressions in plenty of projects. I've certainly accidentally broken features fixing security issues. But it is deeply counterproductive to give so much focus to the LLM as the tool, rather than all the contributing factors that set up the system for this failure.
In general we should be looking not at the individual (or tool) that was the proximate cause, but the aspects of the system that produced the error.
Blameless postmortem culture does not indemnify tools. We should blame tools which are proximate causes, especially when those tools are specified by the chosen process or enshrined within the system's structure. In this situation, the tool of choice (a chatbot) is known to be incapable of completing the task as expected, and the choice of tool is worth critiquing. This is not to blame the author, but to blame the process chosen by the author.
Honestly, not sure how many more times I can repeat my belief that the author is competent in general. When a system is specified to use a footgun for its process then blame is not required to understand why somebody has lost a leg.
the tool of choice (a chatbot) is known to be incapable of completing the task as expected
In this case, the author, who you stipulate is competent, reviewed the code and decided it did complete the task as expected. I think that is a more reliable signal than your blanket blind dismissal of all LLM output.
Double-check your logic and spell out what you claim. Given a codebase that has been recently changed by LLMs, you have two signals. My initial signal was that the codebase may have regressed merely because of LLM usage. The author's signal was that the codebase did not regress, as demonstrated by tests. The LLM changes caused regressions. Why is their signal "more reliable"?
Your signal: Somebody used an LLM.
My signal: The developer who’s been working on the code for decades thought the code was OK.
the tool of choice (a chatbot) is known to be incapable of completing the task as expected, and the choice of tool is worth critiquing.
I don't think this is true. The tool has completed similar tasks successfully thousands of times. That's not to say it is risk-free or reliable or a substitute for human judgement. But the claim that LLMs are incapable of performing bug fix tasks is plainly wrong.
When a system is specified to use a footgun for its process then blame is not required to understand why somebody has lost a leg.
You haven't really engaged with the system though. In this thread you've been single-mindedly focused on the LLM, to the exclusion of thinking about the other parts of the system that contributed, and whether they contributed more than the choice of tool.
I quite deliberately tried to err on the side of fixing security issues for that release, and there were some valid (but unusual) use cases that got caught up in the changes.
There we go, reason given. Security fixes caused regression. It might not be enough of a reason for you, but it is also entirely missing from your numbered list of sentences for some reason.
The world of software engineering has changed dramatically in the last few months. The world of IT security and maintaining software in the face of the flood of reports has completely and utterly changed just in the last few weeks. Anything you learned about this stuff last year might as well be from another planet.
So they keep saying. Over and over, for the last half year or so. It's exhausting to hear this.
Given that LLMs are the cause of this flood, it feels like looking to LLMs as the solution is misguided beyond belief.
But yes, I also immediately believe it sucks to be a maintainer of anything popular right now. Maybe for him the best solution would be to walk away and really enjoy retirement instead of trying to cram more into his limited computing time?
I, too, am exhausted by people's defense of vibecoding. It reminds me of a cult. To me, a tool even half as useful as what people claim these are would speak for itself in its usefulness.
However, I do think a person like Tridgell is worth listening to. And the "flood" of security reports does need to be taken seriously – a security issue is a security issue nomatter what (or who) found it. This post, therefore, to me, isn't like the exhausting barrage that we usually see.
As someone who is fine with vibecoding, I feel people who are against it are like a cult. It's a tool that's useful for some things and really bad at others, and unfortunately has a toxic discourse around it because of what it represents to either side.
That's fair. But do you see how some (vocal) proponents of vibecoding (probably not yourself) can be seen by some as cultish? And that since they are so loud, and are aligned with such massive forces, they can trigger a (perhaps too vocal, perhaps too cultish-sounding) reaction on the part of those of us who do not agree?
These very vocal proponents of vibecoding do not seem to see it as "a tool that's useful for something and really bad at others". They seem to see it as a revolution that must be defended against any and all criticism at all times.
To add to that, very vocal proponents of vibecoding tend to dismiss complete categories of valid criticism out-of-hand, which also sounds very cultish to me.
The very vocal opponents of using LLMs in any part of software development also tend to dismiss valid criticism and ignore actual capabilities of models we have right now.
I think it's fairer to say that there's people who fell too much into the marketing and hype on one end (some even with financial interests), and people who got too carried away by their skepticism or hatred in the other.
Yes for sure, I agree with that. The most extreme proponents are pretty insufferable and I don't think have ever actually achieved anything of value yet with LLMs. Interestingly about 18 months ago I was completely AI pilled, but since working on and delivering a few projects with "vibecoding" (I'm still a software engineer so I'd say not 100% vibecoding as some people say it), my expectations of AI have tempered significantly. I still think it's world changing, but it's still really hard to do stuff. I think many of the people you mention haven't internalised that yet.
some (vocal) proponents of vibecoding (probably not yourself) can be seen by some as cultish?
This applies to any X, especially if X affected many people. See religion, politics, Linux, Mac, git, rust, erlang, Wayland, entrepreneurs, audio enthusiasts, guitar gear people, etc. We'll see this repeating, there's no way around.
To be entirely honest: This issue has existed long before vibe coding. AI has just made it more visible.
We've always had people who only care about the results, even if that means people hired on fiverr copy-pasting code from stackoverflow they don't understand.
They've always argued that all you need are tests, there's no reason to refactor, or plan ahead, no need for type systems or well-paid engineers, you can outsource and offshore everything, as code doesn't matter, only results do.
On the other hand, there were always people who believe that the digital foundations of our civilization need to be rock solid. To build ever higher into the sky, you need to be able to trust that the layers below will hold.
And even today the only way to achieve that is with code that was architected & handwritten by engineers that genuinely care.
You can't build on shifting sand. And if the entire foundation is eaten away by rot, soon you won't know which rafters and beams you can stand on, and which will have you fall to your demise.
Unless you're making every stone, every nail, every dependency from scratch yourself, you now have to watch as your own projects slowly break apart, fail in ways you could never have foreseen, due to forces outside of your control.
I can't imagine how one could experience this without losing themselves in unending, fiery rage.
I feel that the discussion around agentic coding these days is pretty much like discussing if child labor is profitable or not: on one hand we can pay kids very little; on the other, they're not as productive as a full grown adult. I guess they're cheaper for some jobs but not worth the price for others?
The point is that regardless of being more profitable or not, we shouldn't have child labor. And regardless of AI increasing productivity or not, these models were trained on stolen copyrighted material: books, websites, GPLed code, and they reuse this material in a way that a human does not.
The immorality of child labor is in a different universe from the immorality of piracy (making an unauthorized copy of something, I might remind people, is not theft, and therefore "stolen" is not the right term for it).
There are also ethical considerations to not using AI. As professionals we have an ethical duty to not have major defects in our work output. Refusing to use an LLM even to the extent that it can help find defects is ethically dubious in itself.
That's a classic straw man argument you present. I never said using AI was as bad as child labor, but instead of rebutting my point you prefer to attack a position I never held and the definition of the words I use.
I didn't attack you personally at all, nor did I "attack" your position beyond critiquing it. You set up a comparison, and I pointed out both the flaw in your comparison and (what I believe is) the inaccurate premise that piracy is theft (so I absolutely "rebutted [your] point").
My point is: when discussing something bad (for different levels of "bad") we should question if we should be doing it at all, instead of discussing in which ways it can be useful or not.
The way most AI models are trained is illegal and unethical, regardless of if it's "stealing" or just "copyright infringement", regardless of it's as bad as child labor or not. It's still bad. Give me an ethically trained model that I can run locally and I'll do it.
Correct — there are ethical concerns with using AI.
There are also ethical concerns with not using AI.
There is simply no option that is completely ethical here. There is an ethical dilemma that different people are going to come to different answers on.
Are there truly ethical concerns with not using LLMs? Or is it just that manually writing all of the code necessary to achieve current professional standards of rigor requires more time and/or money than just about anyone is willing to invest, so we're pressured to take the shortcut of using LLMs?
Yes, there are truly ethical concerns with not using LLMs. As professionals we have an ethical duty to do the best job possible given the resources and tools available to us.
(At least that's how I've lived my professional life, aiming to build things of lasting value and successfully doing so to at least some extent.)
As professionals we have an ethical duty to do the best job possible given the resources and tools available to us.
I certainly agree with that. So then the question is whether LLMs make it practical to do things that would be impractical any other way, or whether the problem is that our employers want us to move too fast, and aren't willing to invest in hiring more programmers, so using LLMs is the only practical option for, say, achieving a certain level of test coverage or finding more bugs. I suppose those who say that using LLMs is completely unethical would say that the solution is to hire more programmers and/or move slower, not use LLMs.
Hiring and training more people can help, but I don't see a path to them entirely replacing LLMs. No matter how many humans you hire, there will always be an incremental improvement in quality you can achieve with LLM assistance.
Now that's an interesting position that I don't think I've heard before. I've always been working with the premise that using an LLM is always a shortcut, a quicker, cheaper way to do what would be better done by a skilled human.
I've mostly been using LLMs to increase the quality of my work output beyond what I can produce alone. I can be so incredibly thorough with them. A simple example is not just reasoning about various options, but building prototypes for all of them so I can make more informed decisions.
FWIW I've really appreciated reading & listening to various Oxide folks' takes about using LLMs in service of building robust systems, including yours.
I've avoided LLMs for personal and general ethical reasons so far, and as an open source maintainer I've also been exposed to the absolute worst of them (giant PRs with one-shotted bad features, enormous dot point lists with almost-correct almost-relevant analysis, etc, etc.)
So it's been interesting and challenging to see examples of what skilled engineers can do with them when they care about the work. (I am afraid to say that I haven't personally seen that many other detailed examples of this out there in the wild, outside of Oxide employees, but maybe I'm not looking hard enough.)
Thank you. I'm very very aware, from both online and offline conversations, that the vast majority of people using these tools are doing so incredibly irresponsibly (at best). I've gotten my own share of slop, only some of which was clearly marked as a prototype to chat about direction with.
But there genuinely are a minority of people using them well, and that's not just limited to us at Oxide. I've gotten a PR that was LLM-assisted in the sense that it clearly went through several rounds of review iteration before being put up, and displayed an unusual degree of rigor and thoroughness. It brought me so much joy to review it, I was riding that high for days! It was the best PR I've ever received from a first-time open source contributor.
Oh yeah, being able to build three prototypes (with zero effort and therefore zero emotional investment) in the time it'd take me to build one before is really nice.
Just wanted to reply here to let you two know this has been the most useful pro/con debate I’ve seen since LLMs came on the scene. Thank you both!
My working definition of "cult", in terms of social dynamics, is something like "group of people which consume members of the group." Chatbots consume people via chatbot psychosis; previously, on Awful, we have examined Spiralism (which I still want to call "Cyclone Emoji Cult"), a brand-new leaderless cult formed around the 🌀 emoji and focused on chatbot-induced manic states and delusions. By what metric or evidence would you say that anti-vibecoding is a cult? I'm not engaging in behavior/information/thought/emotion control or harming people; I'm merely pointing out that vibecoding is hopeless because chatbots are bad at writing.
That's a pretty specific definition so you're likely to find my explanation insufficient simply because we don't share a definition on what a cult is. But I will try!
When I say I find it cultish, I feel some folks (definitely not all) vehemently against vibecoding/LLMs are arguing from an emotional place, driven mostly by identity. I'd say that's cultish in the sense that it's somewhat closed-minded as to where these tools might have a place. The other cultish behaviour I see is people sharing anecdotes in group chats and the like about how LLMs failed so badly at this task or that task as if that justifies outright rejection of them. It feels cultish because the purpose is to justify and defend a pre-existing worldview, similar to how cults develop habits of just reinforcing the beliefs rather than being open to new information.
Don't get me wrong, there's lots of things I hate about LLMs and what they're doing to society, but "vibecoding can be useful" and "I hate people sending me slop" can both be true at the same time.
One problem is that "vibecoding can be useful" already depends on your precious saved minutes being worth more than all the externalities you accept (or more likely ignore, since we still don't have a full idea).
The "cultish" behavior you describe comes indeed from an emotional belief.
I believe that the most important attribute for software is to be reliable and predictable.
In fact, I believe you cannot in good conscience call yourself a software engineer if you disagree with that.
And that's how I built my software, and what users, clients and employers have come to expect. This is a core part of the professional reputation I've built over the years.
AI generated code, no matter how good, cannot match these expectations.
But even if I don't use AI, there's enough of a marketing and social push to use AI that others will. And now libraries and tools that I've used for years, which have worked reliably for decades in some cases, suddenly become unreliable. And with them, my projects also start to fail.
AI generated code, no matter how good, cannot match these expectations.
What makes you so sure about this? You say it as if it's unambiguously true.
See my many many comments in this thread, but generally, whether I'm maintaining patches for other people's AI generated code, or trying AI myself, the result is always so bad, if a junior dev did this more than once, they'd be let go.
AI fans always argue it's a skill issue and they have no issues with AI, but if I look at their AI generated code, it's just as bad if not even worse.
Of course you can get the same issues without AI, especially when you outsource development.
But that's not what I'm measuring AI against. Instead, I'm comparing AI with the same standards I'm setting for myself. I'm someone who reads the specs (and has contributed to several protocol specs), who reads & contributes to upstream libraries.
And that's just not possible with AI. By default, it is just plain wrong about many details in many languages and libraries.
If I provide full docs for all the languages and libraries I want to use, AI won't read them, no matter how much I prompt it, until I disrupt the prompt and yell at it to read a specific document RIGHT NOW.
Getting AI to actually read the docs, make a plan, write a prototype, re-read the docs and the library sources, adjust the plan, rewrite the implementation...
...it's like pulling teeth. Absolutely exhausting, much more work than doing it yourself, and it still won't do it reliably.
And if you look at AI generated code online, no matter in which project, it's always the same. It's all full of mistakes only a beginner would make.
And the goal of a code review is not to redo the thinking the author of the changelist. You're supposed to be able to trust that the author has done their work properly.
You should be checking if the the general concept is sound based on the description, which should explain how the author achieved at the concept.
Maybe ask whether the author has considered certain edge cases that aren't listed in the description, and automated tools should be checking for smaller mistakes.
But with LLMs that's just not the case. You can't trust them in the same way. And as most of the work of a change is not writing the code, but actually thinking about the issue, you'll have to redo most of the actual work yourself.
This is the complete opposite to my own experience. Both Claude Code (Opus 4.7/4.8) and ChatGPT Codex (GPT 5.5) have been excellent.
E.g. I've contributed bug fixes to mature libraries such as SDL for niche Emscripten issues. The agents went deep into the SDL codebase, the Emscripten codebase, figured out the issue, wrote self-contained minimal tests to reproduce it, and then I authored the fix by-hand as requested by the SDL repo policy.
The same kind of investigation would have taken me hours or days, and the agents managed to do it in minutes.
I've also had extremely good experiences with my own C++ personal projects. With my guidance, LLMs were able to write high-quality C++ -- of course after some iteration, and some manual tweaking was required.
I am deeply impressed by the jump in quality of frontier models compared to -- say -- 6 months ago. They went from being a "somewhat useful" tool to a massive boon, and I am excited to see how they'll be in 6 more months from now!
The differences between the scenario you describe, and my scenario, are why we have different experiences.
The results get worse the if the task requires knowledge that the AI doesn't shop with, or where the AI's builtin knowledge is wrong.
In your case you're fixing an issue in an external library. And it's libraries that AI is well trained on.
As a thought experiment: What if you wanted AI to use a programming language and library ecosystem it hadn't been trained on at all? How would you accomplish that? What if AI was (wrongly) absolutely convinced things should be done in a certain way, even though you know that is clearly wrong?
The reason AI prefers as much code duplication as it does (see Claude Code leaks) is because AI today heavily struggles with cross-referencing data across different sources.
It'll try to extrapolate from names, instead of actually going through dozens of files to understand how something is supposed to work.
It's always taking the lazy route instead of being thorough and thoughtful. And what use a piece of software of it's lazier than me?
As a thought experiment: What if you wanted AI to use a programming language and library ecosystem it hadn't been trained on at all? How would you accomplish that? What if AI was (wrongly) absolutely convinced things should be done in a certain way, even though you know that is clearly wrong?
I had very good results using Claude to port over tests from a pretty heavy Rust macro DSL to a builder interface, as well as doing bulk edits to tests in our homegrown integration test suite DSL. In both cases I told it to look at existing tests (including other test-framework port commits for the former case). In particular, in the former case it would infer the intent of the test and actually comment it saying "test the case where foo, test the case where bar", and I was able to use it to replace synthetic test fixtures with more "natural" ones. There were some weird mistakes but overall it was a huge time saver.
In both cases having a short feedback loop was important: good parse errors, good compile errors, good runtime errors.
I also today got feedback from our PR review bot that some code I had written had forgotten to consider a specific case, in a way that's very much not obvious just by 'local' analysis; the semantics of the code under question is not something the model would have been trained on since it's pretty unique to my employer.
It is entirely possible that our use cases are very different, and thus we'll always have very different opinions regarding the usefulness of AI.
Regarding your own thought experiment: I would use Claude Code to perform a deep review of that programming language / library and produce one or more detailed documents. I would then review the documents both myself and with a competing LLM, such as Codex. I would then iterate until those documents are correct and comprehensive.
I would then prime the LLM's context with those documents, and give agents full access to explore the codebase, documentation, and related materials.
This is pretty much what I've done with my own C++ game development framework (https://github.com/vittorioromeo/VRSFML/), which is very unconventional -- I avoid any use of the Standard Library whatsoever, and I wrap compiler builtins into portable macros. My goal is to achieve very fast compilation speed and avoid run-time overhead in unoptimized debug builds.
Both Claude and ChatGPT were able to pick up the patterns and idioms of my codebase quickly. In fact, I believe that Claude autonomously wrote up a list of quirks/idioms of the codebase that it refers to, ensuring that suggested changes match the philosophy and style of the framework.
This has worked really well for me so far. It's not perfect, but impressively effective.
Perhaps you have a more specific example in mind? I'd like to try it out with my own workflows to see if I can replicate your negative experiences.
EDIT: On this point specifically:
It'll try to extrapolate from names, instead of actually going through dozens of files to understand how something is supposed to work.
Claude Code consistently browses multiple files before coming up with an answer. Whenever I use it, it seems to do exactly the opposite of what you're claiming here.
Previously, on Lobsters (1, 2, conclusion), I put several months into seeing whether vibecoding did anything, followed by a challenge to the community to see for themselves. Sorry, but the bots don't generate good code.
I feel like the complete opposite is happening.
I am not a fan of "vibecoding", but I am a big fan of human-driven LLM-assisted development, where the human is responsible for the design/architecture of the codebase and for deeply reviewing and tweaking any LLM-generated code. This of course implies that the human is capable of deeply understanding the code, and that LLMs are used as accelerators rather than as substitutes.
However, any sort of project that uses AI gets immediately demonized and criticized by the masses, being accused of being "AI slop" without any sort of actual investigation.
You worked on a codebase by hand for 10 years and used Claude to generate some unit tests? AI slop.
You used AI to polish some hand-drawn assets in a game? AI slop.
You used an LLM to perform a large-scale mechanical refactor instead of doing it by hand? AI slop.
Very, very exhausting and unfair. And I see this all over the place.
I'm much more of an LLM skeptic (I use it only in a search and adversarial role), but I agree with you here. It's embarrassing that people are throwing insults like "lazy slop" at Andrew Tridgell! He clearly isn't lazy, and clearly produces better code than most of us, as he's demonstrated time and time again. Such insults are an embarrassment to actual vibecoding criticism. There is lots out there for which the insults are appropriate, but this is not it.
I think there's some hyperbole here that glosses over valid criticisms of AI use.
You worked on a codebase by hand for 10 years and used Claude to generate some unit tests? AI slop.
I don't think anyone will actually get their pitchforks out solely over unit tests. The concern is over outsourcing implementation to the LLM, which seems to have happened in this case. There might be more or less careful ways to do that but I think it's pretty uncontroversial to say that you understand something better when you write it vs. just reading it.
You used AI to polish some hand-drawn assets in a game? AI slop.
If you have artists working on your game, then those artists can provide polish with intent. If you use AI to save costs or cover a skill gap, this is taking work from artists and leads to lower quality. This is a valid and widely held concern.
You used an LLM to perform a large-scale mechanical refactor instead of doing it by hand? AI slop.
This is a matter of degree. "Convert my project into 1M lines of Rust that no one will read" leads to an outcome that I think it's fair to call slop.
I don't think anyone will actually get their pitchforks out solely over unit tests.
The open-slopware list, which was intended as a list of projects to avoid, boycott, downgrade and fork, included projects because the maintainer had spoken favorably about LLMs, even without any LLM code entering the codebase. There do exist some people who will get out pitchforks over any perceived sense of contamination, ignoring the implications (is it tests or is it a complete autonomous rewrite in rust?) and code quality.
First thing I mentioned happened directly to me. ~10 years is accurate if you include a few years of burnout-induced pause.
The second thing I mentioned is something that I personally spotted in multiple Steam indie game reviews. Particularly, one reviewer changed its positive review to negative after reading that AI was involved, writing stuff like "if I ever knew AI was involved I would have never purchased it". Ironic how they didn't notice AI was involved at all in the first place.
The third thing I mentioned is also something that I've experienced. Large-scale refactor (as in moving and renaming a bunch of components across a codebase), co-authored by Claude, got called out for it. That's when I disabled the "co-authored" commit message altogether.
Also the "convert my project to Rust" happened with Bun and the result seems surprisingly good so far.
None of these were hyperbolic statements, they're all grounded in reality and personal experiences.
Large-scale refactor (as in moving and renaming a bunch of components across a codebase), co-authored by Claude, got called out for it
And rightly so! A refactor is very different from a refactor done with AI. Try doing a refactor with traditional tools vs with Claude Code and look at the git diff. There will be many small changes AI introduces.
I'm maintaining a patchset over a bunch of popular software, and it's absolutely eye-opening.
In one project I'm reporting and the same issues in the same functions again and again, after every single refactoring they're doing, because Claude Code is "smoothing out" the code, which includes removing some "oddities", even if there's a well-commented reason why it's necessary and why the obvious solution won't work.
The fundamental disconnect is the question whether you trust AI.
I refuse to let AI make any changes unless I've manually derived them myself and come to the same conclusion.
Every single time, AI has missed some nuance and chosen the seemingly obvious, but wrong, solution. And if I'm just reviewing the AI output, I won't be able to notice these mistakes.
The only way AI provides a benefit is if you don't care about these small bugs and edge cases. But if you do, AI provides no benefit, because you have to do everything by hand anyway just to validate the AI output.
You certainly have a point (about the lossy refactorings LLMs do) and I don't let these things loose on my codebase for the same reason (they just can't help themselves at tweaking things when they copy them, be it removing comments, changing types slightly, whatever),
but would you feel the same way if the LLM was instead wired up to LSP-based determinstic tools that were auditable, replayable and didn't offer the chance for an LLM to actually get involved in the text itself — i.e. could not really do harm? A constrained toolbox, of sorts. IMO There might be room for the latter kind of tool.
I have lightly played around with a CLI wrapper for LSP operations for this reason.
What you describe is IMO much closer to how we'll end up using LLMs in the long term.
Similar to the expert systems of years gone by, I see the task of an LLM more in turning a human sentence into a pipeline of predictable tools that operate on a structural, not plaintext, representation of the data.
A previous attempt was using an LLM to rename a set of variables across a codebase. The LLM was given read-only access to the codebase and had the task of generating a CSV of before/after mappings. Regular refactoring tools were then used to apply these mappings.
Still, even that was lacking, as the LLM couldn't identify the structures underlying the code, and as result the variable names it suggested were each generated individually, instead of trying to find a general pattern that could be used for all of them. (And even then, the variable names still had some mistakes in them).
(That test was with Gemini 3.5 Pro Preview)
Another test was using AI for code review.
All AIs I've tried will flag usage of $ in helm's define named templates and suggest using . instead.
But in named templates $ refers to the parameter, not the global scope, and is absolutely necessary when used in conjunction with with.
Every single model got this wrong, when I explained it none I would actually believe me, and only after providing the helm docs did they stopped flagging it.
And the moment another lint issue showed up, they immediately decided to "fix" that scope issue again, ignoring all the comments, docs, AGENTS.md and previous context after just a minute.
Ironic how they didn't notice AI was involved at all in the first place.
It's not really ironic if you either are ethically opposed to LLMs, or think that games are an art form and artistic expression is about more than the superficial product. Whether they could tell is not the point. It's like saying "they were mad when they found out that their clothing was made by slaves - ironic how they couldn't tell the difference."
the result seems surprisingly good so far.
Here too - I think in many cases it's not about the results on their face. The fact that Bun threw away a mature codebase for an AI-driven rewrite, with no pretense of ever looking at the code, and thinks that test cases are good enough to establish parity, tells you a lot about their approach to the project. Their approach concerns a lot of people who have now stopped using Bun - regardless of whether the rewrite actually resulted in errors. It's really the flippancy that's the problem. The AI use was a signal.
Given that LLMs are the cause of this flood, it feels like looking to LLMs as the solution is misguided beyond belief.
This doesn't sound right. If LLMs can find bugs and create exploits, they can find those same bugs in a maintainer's hands. If only attackers used them, we'd be in for an even rougher time.
Given that LLMs are the cause of this flood, it feels like looking to LLMs as the solution is misguided beyond belief.
You might almost start to suspect that there are people who profit from convincing us the answer to systemic problems caused by LLM, is for us all to buy even more LLM. And end up in a downward spiral of increasing LLM dependence.
Given that LLMs are the cause of this flood, it feels like looking to LLMs as the solution is misguided beyond belief.
Similarly, "web search is useless now because the web is flooded with clickfarming AI-generated web pages, so just use a LLM directly".
Given that LLMs are the cause of this flood, it feels like looking to LLMs as the solution is misguided beyond belief.
Can you explain a bit more about this? I interpret him as saying that LLMs are useful for helping him address reported security issues (which also happen to be a thing LLMs seem to be useful for).
What's the interpretation for it being misguided? Is it that we'd be better off if no one had LLMs?
It’s misguided because LLMs are well known to produce buggy code that “looks correct”. But also more generally, LLMs are what got us into this mess. So yeah, maybe we (and the planet) would be better off if nobody had LLMs.
Humans are also well-known to produce buggy code that “looks correct” :)
That's why we've developed so many techniques to work around that.
e.g., when I do a refactoring, I will make a separate commit, in which I use JetBrains refactoring tools to move things around, and then another, different commit with my own actual changes.
This way the first commit may have hundreds of changes, but they're all deterministic and known-good, and the second commit may contain mistakes, but it's only a few lines changed.
But when you e.g. refactor with AI, it'll always be doing both at the same time. And that's impossible to review.
it'll always be doing both at the same time
This is definitely not true. In fact you can ask an LLM to make a nice patch series out of your hand-written code which does five things at the same time. I do this all the time -- split changes up into refactoring and functional ones -- enough that I'm notorious at work for making stacks of 20+ PRs.
This is definitely not true.
And
I do this all the time -- split changes up into refactoring and functional ones
Have you actually diff'ed the result of using an LLM for a refactoring vs using traditional tools?
Because while it looks the same and would pass review, it certainly isn't the same and introduces tiny, hard to catch changes. Some typos get fixed, some comments adjusted, some code reformatted, and some tiny hard to catch heisenbugs are introduced.
As others have also confirmed elsewhere in this thread.
Yes, I spend a lot of time doing that kind of thing. I move quite slowly when I'm using LLMs! I'm not sure why you think I don't. I've been on record about this several times before (previously, previously).
For code formatting I use rustfmt with our without LLMs.
So they keep saying. Over and over, for the last half year or so.
I mean, even if you don't agree with LLMs being useful tools. Developing OSS certainly has changed for better or worse thanks to LLMs certainly for projects like rsync and curl where it seems like everyone and their grandmother is trying to find security vulnerabilities using LLMs.
If LLMs are then the right tool to look to as a solution is also a valid question to ask. Although I do think it overlooks the fact that many OSS projects have been chronically understaffed and have been for quite some while. This xkcd meme isn't actually a meme and true for a worryingly large amount of projects. Speaking from some experience (though not nearly on this scale) finding people to actually stick around and do more than incidental contributions can be next to impossible. Even with that in mind I think I have only seen some gradual OSS adoption of LLMs in the past year possibly even in the last 6 months. In fact I can back this up as I made this comment 9 months ago.
So even if you don't agree with the use of LLMs. Something has been changing in the last couple of months.
It's equally exhausting to hear LLM usage being demonized. It's exhausting to hear the term "vibecoding" for human-driven LLM-assisted development. It's exhausting to have hundreds of people attack a project or a developer for disclosing any kind of AI usage.
LLM-assisted development doesn't have to be "slop".
Ugh, Medium and Cloudflare. Two horrible tastes that do not go together.
Open-source maintainership is a thankless task. Tridge was trying to fix tech debt in the test suite and deal responsibly with the flood of LLM-detected CVEs, but he was hit by Hyrum's Law. I guess his plans for 3.4.4 are the least bad option and he will have to just barrel through.
Appreciate this being written and shared. Parts that stood out to me:
I’m trying to decide at the moment between a 3.4.4 release that softens some of the regressions and going for the 3.5.0 that I had planned with much larger changes. I’m trying to decide at the moment between a 3.4.4 release that softens some of the regressions and going for the 3.5.0 that I had planned with much larger changes.
If the author is reading this, going for 3.4.4 looks like the right approach here. Going straight for the major 3.5.0 would not land well and by many be seen as reckless. Considering regressions in last release. Making it easier for people to grok the diffs alleviates concerns.
I thought it would be a good idea to do the core structure for the new test suite in public on master first though given all the rage that has generated maybe that was a bad idea.
I don't think less transparency would improve optics and reception. At best it could postpone an even larger backlash.
I’d suggest you try the new rsync test suite on openrsync if you can stomach something that an AI has helped write. I tried it today and openrsync currently fails 85 of 98 tests, so I’m sure it won’t take you long to get it up to speed
This is not really fair, considering samba rsync is protocol 32 and openrsync protocol 27. It's not advertised as feature-complete.
This is not really fair, considering samba rsync is protocol 32 and openrsync protocol 27. It's not advertised as feature-complete.
I believe that was the meaning. Basically: it's that far behind, good luck.
Going straight for the major 3.5.0 would not land well and by many be seen as reckless.
An idea: work on 3.5, do alpha releases, invite scrutiny and testers.
Yes, after the 3.4.4.
(Or, ideally, in parallel. But that that might be asking too much of project maintenance right now. This is something others can do independently)
It's a bit odd going "I don't have time to do this safely and incrementally but FLOSS allows you to use or fork an older version" instead of "I don't have time to release fixes for all the reported issues right now but FLOSS allows you to pull in or write your own patches".
Many times less is more. The urgency to RELEASE FIXES FOR ALL THIS RIGHT NOW looks artificial and imposed.
They say the game has changed and I read between the lines that relying on obscurity and non-disclosure no longer being an option is a big part of that (was it ever, though?). An alternative compromise would be radical transparency - make confirmed vulns and CVEs public right away and let community figure out the rest. Not sure if I recommend this but it would be one way to offload much of the pressure from author to community.
It's a bit odd going "I don't have time to do this safely and incrementally but FLOSS allows you to use or fork an older version" instead of "I don't have time to release fixes for all the reported issues right now but FLOSS allows you to pull in or write your own patches".
What you say is strictly worse for anyone who manages their system with tools of-this-decade enough to let just roll back, no? Releases with random breaking changes, then fixes in follow-up releases — this has been modus operandi of many many things before LLMs anyway, so it's not like you won't need this tooling in the world where all datacenters of companies from some list are bulldozed.
I'm torn about this. On one hand, I think security can only be insured by humans writing code themselves. Because you're reflecting on the code as you're writing it and will catch errors early. I write code much better than I review it. A lot of things go past me during review, because I didn't carefully think about every line being written.
On the other hand, ignoring the basic fact that harassment is unacceptable, I also think Andrew is allowed to run his project on his free time however he wishes. If he wants to use LLMs, I don't approve but that's his project, that's his prerogative. If I'm not happy about, I should be the one moving my backups to restic or borgbackup, or just forking rsync.
Just to be clear, I'm not anti vibe coding, my employer semi-forces me to use them, and they're okay at writing boring non-novel glue code, which is most of my job these days. I just don't use them on my free time.
Full ack on everything you said. Regarding backups, rsync isn't even a very suitable solution for those because it won't help you restore a file of which the contents got corrupted. Something like restic is way better as it handles such cases by also keeping old versions of files around. It actually tracks deletions so that you also know which files are not relevant anymore.
It's like the "in theory the theory is more important, ..." joke, I believe. I've got experience with app security, I can choose some exploits, I can catch things in review. But I'm nowhere and good as the current top LLMs running a "find more pathological cases" Ralph loop. I've found issues in my code, in the libraries used, in the alternative implementations, etc. that way in popular software without even trying hard. The human available time and determination is nowhere close.
I think security can only be insured by humans writing code themselves
Sometimes we think of security as "Secure Coding when handling untrusted input", but that's only a fraction of what goes into ensuring security.
Across orgs there's whole swathes of security software written to prevent, detect, and respond to issues. Across each of those fronts, there's always gaps, and more work to be done. An org might be willing to accept an improvement to security posture, knowing it might not be perfect, than not have that improvement at all. This is part of the tradeoff an LLM offers. Where that tradeoff falls will be context dependent, but it will rarely be "all code must be hand written".
This applies to servers like rsync too, who as the author suggests might want to make significant refactoring to be more robust/resilient. If you can LLM refactor towards privsep with a smaller TCB, you might be willing to accept some bugs that fall outside the TCB.
I don't have context on rsync, but I trust the author is making the best decisions for the project and its users, given the limited resources these projects typically have.
I find that wording odd - I would have thought security is helped by guarantees and mathematical proofs and such - enforced by automation as your code evolves. This gives an upper bound on the kind of problems you have.
The lower bound is the bugs and vulnerabilities you can find, or someone else can find, or anything else (eg an LLM) can find.
A human reviewing the C they wrote (eg rsync) is not what I’d call a good position in this space (and by no means offence to Andrew here).
The most hilarious thing is we already know from experience that unit and integration testing and human review is not sufficient for strong security posture in C, but people somehow think things will be different if the token slot machine is involved :)
I'm not aware of any proof system that doesn't use base facts written/hardcoded by humans. Also what you're trying to prove such as "non-admin can't access resource" is also fed by humans. Proof systems don't use pseudo random number generation as model temperature thus giving random outputs, they're fully reproducible.
Comparing LLMs to proof assistants and dependent typing systems is, IMHO, like movie making to comparing mechanical engineering papers because they both use digital cameras.
I've used PRNGs to assist in software quality assurance for many years, well before LLMs. Not mathematical proof, sure, but high-confidence evidence. The idea that you can't assure systems using randomness is quite strange to me.
I certainly wouldn't trust human review alone (nor would I trust a PRNG alone; they are both load-bearing for high-assurance software).
I was talking about soundness vs completeness in creating correct code (proofs and testing).
Using Rust for checking certain facts helps from one direction (proofs). Increasing your space of tests by getting a LLM to identify a problem helps from the other direction (examples and counter examples).
You can add a regression test and patch the bug by hand at the point you become aware of a problem. To be clear, I am referring to current bout of LLM-found pre-existing vulnerabilities, not the current bout of LLM-created new vulnerabilities :)
If he wants to use LLMs, I don't approve but that's his project, that's his prerogative. If I'm not happy about, I should be the one moving my backups to restic or borgbackup, or just forking rsync.
restic and borg codebases both also incorporate LLM code.
(but I'd strongly recommend borg over rsync for backups anyways. I use rsync for transferring large amounts of files across devices and keeping e.g. webservers synced, but it is comparatively ill suited for backups in 2026.)
rclone is also an option if you don't mind that when it detects a file change it resends the whole file instead of rsync's clever diffing and data minimization algorithm. What rclone has over rsync, however, is parallelism that makes much more effective use of available network bandwidth.
I am sympathetic to a retired maintainer who'd rather be sailing, but I don't think that context fundamentally changes anything.
Tridgell does not owe us any work, he is free to retire and to sail. If he desires to do that, I wish him the best. I have sympathy for the fact that he feels some responsibility, but also (if I read between the lines correctly) finds it to be a bit of a burden.
But rsync is core software, and I think that anyone who tries to maintain it needs to do so with a certain standard of quality[0]. If the maintainer does work that isn't up to standard, that is a mistake. They do not deserve harassment, any more than you or I deserve harassment for our mistakes (at least, I certainly make mistakes, I don't know about you). Saying something is a mistake is not saying the person who made it is bad, or we feel no sympathy towards them.
So the question is whether the work that the AI coding tools did was up to standard. What standard? Roughly speaking, the standard of "better that this work was done with the quality that it has than that it was not done." If you improve the software, keep doing it. If your work makes it worse, then stop. I don't claim this is an clever definition, but it is the right one. Remember, we're not entitled to ask Tridgell to do extra work, but have room to say "this is making your users worse off" if it's true.
To be honest, I do not have a fully formed opinion on how to judge this work. There were regressions, Tridgell describes them as being in edge cases, but without more context, I don't know how to judge the impact of regression on those edge cases vs. fixing potential security issues.
[0] Someone who doesn't think is typing "WITHOUT ANY WARRANTY" as we speak, but that clause is irrelevant here. It is a disclaimer of legal responsibility, not a disclaimer of any sense of pride in craftmanship or any non-legal requirement to do one's best. My comments here are also supplied "WITHOUT ANY WARRANTY" and yet you will rightfully criticize them if I make a mistake.
But rsync is core software, and I think that anyone who tries to maintain it needs to do so with a certain standard of quality[0]. If the maintainer does work that isn't up to standard, that is a mistake.
No. That is now how open source works and it’s not how it’s supposed to work. The author did not make it “core software”. If you use it or anyone else uses it, you are responsible. If the software does not behave like you expect then you can fork it or replace it. What you cannot do is force that person to jump to your tune.
When Linux distros package your work, you have two responses.
You do not owe people work, but you do owe them clarity about what you owe them.
When Linux distributions package up open source software they assign a distribution side maintainer for that package. They might stop distributing that package if it develops in a direction that does not work for the distribution but they are not offloading all their duties to the thing they package.
I had a lot of my open source software packaged up by distributions over the years and I can tell you that they had a ton of patches on top of it to make it work for them.
Very different behavior to what people are showing in the GitHub issue tracker (or in the comments here) wrt rsync.
Which distributions deny projects that don't clarify this?
Even if we applied this logic, distributions would have to drop a lot more projects before they got to rsync.
Context for those who missed the original coverage of the regression: https://github.com/RsyncProject/rsync/issues/929
Context is valuable but given the GitHub issue is still not locked, perhaps it's preferable to not link directly to it? The issue report is a screenshot of a mastodon post and then around 300 (and counting) bickering comments.
Don’t explain bro. Just keep doing your thing. Haters gonna hate. They started hating when people stopped writing assembly and aint never gonna stop.
This is transparently false, as many people are anti-LLM not in spite of but because of their interest in improving programming languages and environments.