Did Claude Increase Bugs in rsync?
87 points by wonk
87 points by wonk
Hooray! This analysis was exactly what I wished for, and more:
All metrics, methodology, and data sources were exclusively chosen by me, in consultation with my wife, who has a Master's Degree in Statistics from Penn State University.
Props for involving an actual statistician, and for putting together a very approachable write-up!
The analysis uses a single metric: bugs per 10 commits (bugs/10c)
Missed opportunity to use the SI prefix and name it decibugs per commit.
This analysis was exactly what I wished for… Props for involving an actual statistician, and for putting together a very approachable write-up!
Agreed! The article isn’t my own, but I appreciated somebody went past the hype and criticism and provided some data to the impact to code quality.
My take is that I think people are entitled to their opinion one way or another on whether they want to continue to use a FOSS project which is going to be vibe coded moving forward. However, the community outrage that resulted from the maintainer’s pivot to using vibe coding tools is quite something to behold. The empirical data that the post has - for me at least - better contextualize the impact of the maintainer’s practice shift.
Trust will be maintained or further eroded as a result of the adoption of the coding practice by the maintainer. Only time will tell.
What's really jarring is the sheer entitlement people have. You're using a tool that somebody poured their time into without any compensation. They're doing this out of sheer good will, and to harass people like that is disgusting beyond belief. Anybody who's maintained an open source project knows how much work that is. People are free to fork and build their own if they don't like the choice of tools the maintainer settled on. So far, the achievement of the one fork that exists was to reintroduce several security issues and replace the readme.
Literally the second paragraph of every source file in the project:
"This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;"
How can anyone complain is beyond me.
I would be curious to know how many of those offended because of this pivot to vibe coding tools actually contribute/have contributed to rsync in any meaningful way (effort or money).
Anecdotally, I remember wanting to submit a few bucks to rsync but being unable to find a way to donate to rsync specifically, rather than to Samba as a whole (which I don't really care about).
Also, just as I don't have to be a coauthor of a paper to be upset about that paper being plagiarized, I don't really see why you'd have to be a contributor to a project to be upset about its usage of plagiarism machines?
Also, just as I don't have to be a coauthor of a paper to be upset about that paper being plagiarized(*)
(*): for your personal definition of plagiarism, which is far from universal
No, you don't need a reason (any reason) to be upset. But others also don't need to agree with an "upset" with no universally accepted reason behind it.
Since you mostly do not have the means or leverage to remove the source of your upset. It merely hurts your well-being in the long run.
Personally, I just know to avoid projects that use LLMs not for any substantive reason, but just because it’s very off-putting to me, in the same way that someone who says "kek" or "fren" would signal to me not to interact with them further at all, even if for no tangible reason.
I feel like a lot of reasoning currently explaining dislike of LLM usage is backwards - yes there are current ethical, quality, etc concerns, but "anti-AI people" including myself wouldn’t suddenly be fine with it all, if those were resolved.
So I avoid any project with "AGENTS.md", claude co-authored commits, etc, not for any concrete reason, just because I find it yucky, and tasteless, I don’t care if it’s bug-free or not. I feel like maybe some other people also feel the same way
So I avoid any project with "AGENTS.md",
You probably want to reconsider that. There seem to be several projects out there adding AGENTS.md files to precisely tell AI to go away.
I think the bad vibes you bring up have more to do with the people using LLMs rather than any inherent quality of the technology. The presence of AGENTS.md is a good predictor of poor quality right now, but I don't think it's fair to discredit the technology on that basis.
There's an almost impossibly long laundry list of things that need to happen before LLMs become a normal, acceptable technology to me (I wrote a little about this recently) but if they do get addressed, I think I'd be kind of fine with the technology. I'd maybe look at a repo with an AGENTS.md similarly to a child riding a bicycle with training wheels, and while that may dissuade me from using that software, I'd probably think it was fine.
The success of open source projects is driven so much by perception that people spend money buying GitHub stars. Unfortunately this particular perception issue has broken containment and has turned into a talking point. No amount of data can change that. From now on, “rsync’s maintainer used an LLM and now it’s broken” will be busted out by AI skeptics alongside talking points like “data centers waste 500k gallons of clean water a day” and “the METR study showed LLMs make you less productive.”
Note: not saying whether I’m an AI skeptic myself. Just commenting on how debates around this topic tend to go in the average case.
How are those "talking points" rather than just.. fact?
Here's an updated METR study released Feb 2026.
For the subset of the original developers who participated in the later study, we now estimate a speedup of -18% with a confidence interval between -38% and +9%. Among newly-recruited developers the estimated speedup is -4%, with a confidence interval between -15% and +9%.
Even if you only looked at the early-2025 study, the conclusions are not as strong as what I've seen usually claimed in online debates. I encourage taking a look at Table 2 in the 2025 study.
Given both the importance of understanding AI capabilities/risks, and the diversity of perspectives on these topics, we feel it’s important to forestall potential misunderstandings or over-generalizations of our results. We list claims that we do not provide evidence for in Table 2.
As for the water usage issue, here's a blog post that provides context. This is what originally took me out of the realm of talking points and made me look into the data center issue further. (Sorry for the Substack link.) Caveat, to be fair: This post is from last year, and may contain outdated information. But it still points out the scale at which these talking points spread.
I'd like to reiterate that I'm not advocating for a blanket approval of the pace at which this technology is being pushed on the general public. Consumer-facing chatbots are, on the whole, harmful to society. Data centers' acoustic signature is harmful to those who live near them.
I'd still really like a more nuanced discussion in online spaces.
Unfortunately this particular perception issue has broken containment and has turned into a talking point. No amount of data can change that..
I don’t know if the article’s author is intending to persuade anybody with their data. I view the article as contextualizing the spicy debate around rsync’s adoption of the tooling with some data.
With that said, I think you’re correct in saying other non-tangibles are wholly ignored in the article, but my hunch is that I think that was intentional on their part as there is already enough noise by evangelists and skeptics alike.
Here's my favorite part, though. Digging into the data, one of the first things that jumped out at me with blinding clarity was that the worst release, by far, in rsync history was entirely prior to the introduction of Claude:
39.39 bugs per 10 commits
And yet nobody noticed. There was no AI to blame so there was no GitHub issue with 300 comments, no death threats, no threats to fork or move to openrsync. A maintainer shipped a broken release and fixed it, just like normal. The only thing that made v3.4.3 special was the availability of an enemy everyone had already decided to hate.
Incredibly important (and imo expected) takeaway here. If the processes you have in place between you and your users are not ensuring the correctness of your software (through tests, QA, etc.), then you will ship bugs, regardless of the LLMs involved. LLMs can both hurt and help in this process.
If the processes you have in place between you and your users are not ensuring the correctness of your software (through tests, QA, etc.), then you will ship bugs, regardless of the LLMs involved. LLMs can both hurt and help in this process.
I have a few issues with rsync's future. The biggest issue I have with the direction of rsync is that it was a basically done project for years and now with ai he ripped out the testing code and replaced it with a python test suite and didn't leave the old suite running to test for correctness for any considerable amount of time. That's irresponsible in my view... especially as rsync's main purpose is to move valuable data around and the integrity of that data is sacrosanct.
Agreed. I think cURL’s recent post shows the opposite end of the spectrum with that regards; in which strong software engineering practices, which were already in place for years, has lowered the overall value of bug finding with similar AI tools.
As is typical for anti-AI users, it eventually escalated to fantasies of violence.
Could we not engage in this kind of rhetoric, please? In addition to asserting a norm from a subset of people who OP disagrees with, it also pisses off the audience that wouldn't already agree with this on principle, and need to review this the most.
Putting that aside: I, for one, do not care if it is more or less buggy than previous revisions. I care that it is developed in a way that is inconsistent with how I believe software should be developed. I don't expect to convince anyone that that position is reasonable if they don't already have a basic appreciation that there are problems other than efficacy here. Good news is that I don't have to use these versions of rsync if I don't want to, and I will choose alternatives that fork off before LLMs began to be used.
Yeah, this article was so full of anger that I didn't make it very far without giving up. Would have been better if they'd been (or tried to be?) impartial. Also didn't help that they repeated the long-debunked meme that the first bug report was the one the eternal-september kids piled on, and not the actual bug report.
After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice.
I prefer this article now, to be honest.
I'm aware that this metric does not control for commit complexity, security intensity, or bug severity. It does not distinguish between a one-line typo fix and a CVE patch. It is a blunt instrument.
Unfortunately, as one of the people in the "LLMs are bad" camp, not being able to control for that is kind of missing the accusation. The accusation levelled by myself and other people, is that AI leads to people shitting out commits that are larger, that cannot be readily understood, and that that increase complexity*. You can find LLM-proponents speaking along these lines too, and then they shift the goalposts away from the multi-decade industry-proven practice of "reading PRs" to "We need the LLM to be able to test everything", as if that removes the issue of code complexity being technical debt.
In this specific instance, the bug severity is very, very high because it broke people's workflows. rsync is very widely used software for backups, and people found that those backup scripts broke because of the change. That is why the attention was able to be garnered in the first place. To my knowledge rsync has been largely stable before this time, and was considered "battle-tested" enough to trust with backup operations — the idea that it would break backups on a patch update was simply not within the realm of possibility**.
You can argue that it's a fluke that the LLM produced software with bugs, or that the maintainer needs to change their LLM workflow and improve the testing coverage (something that the maintainer has said themselves), but first and foremost a lot of the vitriol is about this tool breaking that trust.
* — Indeed, there's a new set of LLM programmers purporting that they "never read the code" simply because it takes a lot of time to read through and it's more complex to dig through than a normal programmer's code. This is because there is no single mental model that an LLM tool can deliver to you, which is what you're doing when you're reading code (learning someone else's mental model, that is).
** — Yes, yes, the previous release also broke some minor behaviour too, but not to a level that cost people time and money. Losing permissions on a backup is relatively easy to fix.
Unrelated to the above, man you really need to check your site for accessibility. I have pretty good vision and am in my late 20s, but the light grey text on that cream/yellow background is absolute murder to read. Good lord.
Of course this release broke some peoples workflows. You can not change anything without breaking peoples workflows. The maintainer said: It was needed to fix security issues. That typically involves being more strict with your inputs and outputs. That almost always breaks something. I have no reason to think the dev was lying, he was very open about what he is doing and why.
People grab software that comes without any warranty and that they have no contractual relation with and dump it on their machines without testing it thoroughly in their environment and then complain that something broke and they had to spend time and money to fix it? If these complainers had even bothered the a minute to skim the release notes, or recent commits, they would have known that this release requires careful testing as it finally has some real changes in.
Why would a developer care? Why would a retiered developer care? That dev has easily spent a thousand times more time and money than those complainers already. I for one am very grateful that he took up the keyboard and decided to fix the security problems in the software he wrote decades ago, problems that surface now en mass as AI tools got good enough to find some of them. I doubt he would have bothered without some AI help and I prefer a codebase with sloppy patches that needlessly raise the overall complexity a bit with each change over a dead codebase.
I’m confused by the bit you quoted, because it seems like the metric the post used is the count of the bugs weighted by severity per 10 commits. Is the author disagreeing with themself? Am I misreading?
In this specific instance, the bug severity is very, very high because it broke people's workflows. rsync is very widely used software for backups, and people found that those backup scripts broke because of the change.
IMO it's a great opportunity for these "people" to educate themselves about open source software and the GPL license. What does it mean, what guarantees it offers, etc.
I don't think people found out themselves. Guessing that 90% or more of rsync users run older versions which didn't had that bug. I'm one of them :-)
$ uname -a
Darwin riemann.local 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:53:31 PST 2026; root:xnu-12377.91.3~2/RELEASE_ARM64_T8103 arm64
$ port info rsync
rsync @3.4.1 (net)
[...]
That is why the attention was able to be garnered in the first place.
The attention? Well, doesn't take Steven Pinker to understand that large part of the community in tailspin right now. LLMs are better at programming that humans. Not an easy thing to accept. Those who based their identity and self-worth to their programming ability and/or their profession are facing two crisis: uncertainty about their future livelihood / market value and an identity crisis on-top.
Fear, uncertainty and doubt are very hard things to manage and LLM companies are making everything in the power to amplify the effect, to boost their stock value. I believe after October the market will adjust abruptly and possibly the amplification mechanism will subside.
The small (very small) percentage of programmers worldwide, who see code as an art form, they'll probably use LLMs for training and improving their craft.
LLMs are better at programming that humans.
I think you are mistaking velocity for quality. Is a factory better at sculpting than humans because it can produce more at a faster rate? LLMs need a large amount of scaffolding to be effective and that makes it seem like they're "better", but it is purely velocity. Even Anthropic says that its code output is at best on par with humans today, despite noting that it ships 8 times the code.
Speed is not always a good thing.
this article quotes a lot of comments mentioning regressions, but the analysis itself doesn't measure regressions, only bug reports. it associates bugs with the releases that the bug was reported for, rather than the one it was introduced in, but the severity of a release is measured by how many commits it has, even though there are other clear factors (like the duration of the release, adoption of the release by distros, etc.) how does this make sense?
I'm also wondering whether there's a time component here. Older releases have simply had more time for people to discover and report bugs...
Andrew Tridgell has written a response, describing what's actually going on with rsync:
As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues (so I find at least some of them before other people!) and the addition of a whole lot of defence-in-depth hardening techniques. This is all a huge amount of work. I’m retired (though my wife may dispute that!) and I’d rather be out sailing than working on rsync security issues, so I have reached for several AI tools to help with what needs to be done. I have absolutely no regrets about doing that, although from the storm of anti-AI rage it’s clear that many people think I should be hung up by my toe nails and flogged for even considering doing this.
One of the big performance regression seems to have been swapping malloc for calloc, to avoid leaving stale pointers in reallocated memory (a reasonable enough security practice). Unfortunately, even though this part of the change was manual, doing this caused the RSS to become much larger for large, fresh allocations on certain operating systems (if I understand correctly).
rsync is under incredibly heavy churn thanks to a flood of LLM-discovered security issues (which are rampant in approximately all old C software). And Tridge is fixing this for free, like most open source authors, while at least one user publicly fantasizes about strangling him in his GitHub issues. (That illustration, in particular, is quite horrifying. And not just for artistic reasons.)
I wish the guy would just go sailing, enjoy life, and leave the community to sort this out on their own. Surely there's an enthusiast, or a large company with some reputation to improve, that can pick up the slack.
Surely there's an enthusiast, or a large company with some reputation to improve, that can pick up the slack.
I just want to double check. Sarcasm, right?
The reality of projects like rsync is:
So realistically, C/C++ have never been fit for purpose, we can't afford to rewrite all these projects in Go/Rust/Java, many critical projects are maintained by 0 to 0.5 unpaid volunteers, and nobody is going to start paying these people now. And no army of capable people is going to step up to fix hundreds of subtle CVEs in ancient C code. In high profile cases like rsync, you might get a bunch of volunteers. But in practice many of those volunteers would need more supervision than Claude. (And Claude needs alarming amounts of supervision!) Most programmers are bad at C, and only a tiny handful of people on the planet actually write C securely in practice.
And now on top of all of the other so-called "joys" of being an open source maintainer, we can add users fantasizing about murdering volunteer maintainers. To be fair, some of the users were always pretty nasty and unhinged, especially on Windows, so this isn't new.
But my tolerance for entitled, angry users who never so much as donated a dime (or even a well-documented bug report with a thank you note) is pretty low, and always has been. The sign says:
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
Sometimes people show up making lots of angry demands. And you need to tap the sign.
The thread did not stop at words. As is typical for anti-AI users, it eventually escalated to fantasies of violence.
To the author:
First: fantasies are words. You're actually claiming that it did stop at words, or at least not claiming that there was any non-verbal escalation.
Second: If you're going to make a claim like this, you should ask your resident statistician how you might back it up. Showing that a couple people posted something like this doesn't meaningfully support your assertion that it's "typical". And it runs counter to my anecdotal observation (which I've also made no attempt to support with statistics), which is that "anti-AI" users mostly just feel sad, not violent, about LLM usage being inserted in places where it's unhelpful.
From time to time, I see a post where somebody, in a very wordy and detailed posts, counters some subsection of anti-LLM people, the subsection usually having a very emotional/social response to LLMs. Those type of posts feel very disingenuous to me, for reasons I can't clearly explain; I'm not smart like that, but they feel like that because they are punching down.
Because it's all very detailed, it's hard to argue with it from an emotional standpoint, it seems to end up as: "See, LLMs isn't the problems, use it right and it'll be a force multiplier, anti-AI people don't know what they are talking about, they are just scared of being left behind". I also don't want to diminish the work the rsync maintainer(s) with arguments, so how'd I'd ever make a convincing argument?
Like the statistics here could be an interesting thing for OSS maintenance, but it's conclusion ends up being weirdly one-sided, and I end up with a weird feeling that Githubs form of OSS isn't what I'd like to contribute to.
(I do still think that the pile-on in the rsync repo on a maintainer isn't a good thing at all)
(I do still think that the pile-on in the rsync repo on a maintainer isn't a good thing at all)
I strongly agree. There was no need to pile on tridge. Nor on any maintainer who's reaching for these tools because they feel it's the best way to satisfy the demands being placed on them. I totally understand being disappointed you're now depending on these tools that you consider harmful, without having chosen to do so, and expressing that disappointment does not seem like a problem to me. But the pile-on was awful and unnecessary.
I’m not seeing how “punching down” could apply when it’s unclear who is “above” whom. What status hierarchy do you imagine?
The way I think about is that it’s very common in a polarized discussion for people to engage with the more extreme people on the other side rather than more reasonable people. It’s not a strawman because those people exist and are seeking attention, but there are always going to be people like that, unfortunately.
Pro-LLM people have the weight of a billion dollar industry behind them, marketing, lobbyism and everything. That’s punching down.
And they are being (in general) smug about it, which can be extra infuriating (statistically inferred from a population with n=me).
What status hierarchy do you imagine?
At least around my vicinity, everybody already does use LLMs in some form, so it feels to me as a vast majority vs a vocal minority. Being smug about how typical anti-AI users behave thus just not feels like it's the full picture sometimes. I would not call it a hierarchy status tho, some I guess punching down isn't accurate here.
I do agree with your second paragraph, it'll always happen and it should be called out, but to me it feels weird that such a vocal minority is getting so much scrutiny vs (for example) the processes that enables LLM do become so forced into every day things(usually through the behaviour of companies).
The posts got revised since I last read it, and the language that was polarizing got removed, so I don't think this is decent example anymore. The post mentions that this is mostly about the outrage that already has happened, but I feel like it's used as an example of how dumb anti-AI rhetoric can be as a way to dismiss or evade valid criticism of LLMs.
Thank you for nudging the discourse in a more positive direction!
I think it's correct to call out public fantasies of violence as not okay—like, that isn't something we should aspire to as a civilization. But the author calling it "typical" irks me because it's a generalization.
As far as anecdotal observations go, I think this applies. I enjoy seeing folks make specific, measurable claims partly because I love numbers, but also because it encourages online discussion to be just a little bit more like the ideal world in that last panel.
The "typical" part really got under my skin, especially with no counterweight to suggest that it was actually common.
To the author's credit, they recently revised that section! It now reads:
The thread did not stop at words. It eventually escalated to, at one point, visual depictions of fantasies of violence…
…and removes the link to an unrelated news story, in which a different mob was calling for someone else's head. (It also fixes the smaller issue you brought up, which is that violent words are still technically words.)
After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice. If anyone complains about my verbosity or sentence structure — as they usually do, which is the reason I originally let the AI write the prose, among other reasons obsoleted by templating — they can go fuck themselves.
The confident use of emdashes here. I love this guy.
(I have no substantive response myself because I think he did good work)
Thank you for the analysis! I haven't read through completely, though I'm not sure I agree with the methodology. I'd be interested in some bugs per diff (same scale you have already, but multiply each commit by how many lines of core (non-test, non-docs, ...) code), as well as some analysis of how long it takes to reach a certain bug count after a release.
I think it's just not possible to come up with something super convincing though, because the release got significantly more attention than others, meaning bugs are more likely to be reported and so not even a "is this typical for [however many weeks] weeks after a release?" is useful...
I was already considering switching to OpenRSYNC before this controversy. Since Slackware-current now ships OpenRSYNC, the decision became easy for me. I've been using OpenRSYNC exclusively and haven't had any issues with my typical backup and synchronization tasks.
Is rsync installed on your box? Currently, openrsync execs "rsync" by default for the other end of the connection, so you might unknowingly still be using upstream rsync! You can check this with -vv.
Funnily enough, if you don't have upstream rsync installed, openrsync just fails when trying to copy files locally. Even on OpenBSD! You can work around this with --rsync-path=openrsync.