AI enthusiasts are in a race against time, AI skeptics are in a race against entropy
17 points by eatonphil
17 points by eatonphil
One tip shared in this article is: "Don’t deny anyone’s lived experience". Yet, does it talk about the externalized cost of "AI"? Nope.
A lot of us consider this "tech" wrong not because of undeterministic, often low quality output. We don't consider it bad because it can't do stuff. We are against it because we bear the costs of their relentless scraping. That's something every single one of the proponents and centrists conveniently forget.
I'll happily revisit my stance as soon as the tens of millions of requests a day against my sites are gone. Until then, I'll consider "AI" and LLMs a plague.
Exactly. The problem isn't the output of the machine, it's the machine itself, and all that goes into it (physically and information-wise).
My engineers are SO resistant to the idea of shipping code without reading it.
I'll be honest here — as someone building foundational systems, no matter how good the validation/observability tooling is, I will likely never be comfortable putting code next to my name that I haven't not just read but thoroughly understood in detail. To the extent I use LLMs, it is in service of that. I often do things with them like building prototypes for alternatives to the direction I want to take (sometimes I change my mind, sometimes not), or make them a cross-examiner of my own understanding so I've sufficiently steelmanned arguments for alternatives. (I've noticed myself getting quite a bit better at that kind of steelmanning as I've used LLMs more as a cross-examiner -- practice helps!)
I think it's possible other people and teams have different constraints. Maybe speed matters more to them than it does to me. I'm just describing where my feelings lie.
The dirty truth is that even before AI, most code was bad. It mostly worked. It was ugly but as long as it gets the job done or keeps the lights on, no one cares.
I don't expect AI to change this.
The dumpster fires will just get bigger.
Two things come to mind:
So all in all, this is still to me a downside only tech. No place in my loops for it. Come back when you have a backed by evidence positive story.
So all in all, this is still to me a downside only tech. No place in my loops for it. Come back when you have a backed by evidence positive story.
What about the externalities? Even a resounding success story without caveats wouldn't address those, IMO.
To consider the externalities, it would need to work first.
The material science game loop applies. First find something that works. Then find a way to produce it without cadmium.
It's hard to really engage with this post with all the weird slop images (the first and the last one are two versions of the same image, for example)
This post fails to address any of points raised by skeptics with a humorous message: let's all join hands and embrace LLMs.
The author discusses two contrasts - AI boosters presenting their wins in conferences vs skeptics who clean up their mess while grumbling in private. This alone should raise eyebrows. One side is busy loudly promoting their "wins" while the other has no such platform to present their learnings. I would be very interested in attending any such conference where people can safely discuss the mistakes made at the altar of vibe coding.
Their post is specifically meant for teams with already good engineering culture. Given that the long term effects of LLMs on people's skills and reasoning are unknown and short-term studies are already raising some red flags, would any high performing team really want to mess up their dynamics with AI agents?
One side is busy loudly promoting their "wins" while the other has no such platform to present their learnings
We're on such a platform. If you write about bad experiences with vibecoding and post it to lobsters, you will not get universal agreement, but you will have a receptive audience.
Lobsters is an invite-only link aggregation site, and while "LLM generated submissions should be disallowed" is the all-time most upvoted submission, the number of upvotes on that submission currently stands at +499.
Anthropic, OpenAI, and SpaceX are collectively worth trillions of dollars (possibly, we'll see how the IPOs go), and even the smaller AI companies routinely raise millions. Powerful and wealthy people have their fingers in the success of these companies, and the advertising budget they have is astonishing.
Are there anti-AI billboards north of SFO on the 101? Were there anti-AI commercials during the super bowl?
But sure, a veritable handful of people can grind their axe on a niche tech bulletin board, so it basically equals out, I guess.
This is a pretty forceful attack on something I didn't say. I didn't say the two sides were equal in power, or in morality, or anything else.
I just responded to the parent's comment that there is no platform for skeptics to share ideas. And I correctly noted that we're on one.
How convenient that all of the evidence in favor of vibecoding is hearsay about proprietary systems.
Agentic coding feels similar to alternative medicine to me. Lots of enthusiastic testimonies, excellent marketing, but when one wants to measure its benefits in a controlled and reproducible way... But certainly that will change very soon, because the last generation of models something something.
This comment is why it's unfortunate that we have "vibecoding" as a catchall for any use of LLMs, because it means I don't know what claim you're making, and the different claims have different evidence for or against them.
If you mean "all uses of LLMs as part of software development" we can point to the stream of accurate vulnerability reports against Curl and Firefox and many other projects that have been patched. I think we have solid proof that agents are making one positive contribution to software quality (though it does not follow that their overall effect is positive).
If you mean any use of LLM produced code, there are many open source projects that have AI developed features. Every day, it's more. I do not personally have an example that I would cite as evidence, but other people do. For instance, here's a claim that Ghostty is a solid example of high quality Zig software: https://lobste.rs/c/kvngy9.
If you mean examples of people merging AI generated code with no human review and getting good results (the strictest meaning of vibecoding), I'm also skeptical this works well. That said, I don't think the examples are all proprietary. The bun rewrite fits this model (I repeat--I am skeptical this model works well). There are surely other projects that are open and follow this model, and which we can judge.
"we can point to the stream of accurate vulnerability reports against Curl and Firefox and many other projects "
But we must also point at the torrent of inaccurate vulnerability reports against every project that forced developers to put up fences, and the torrent of random code contributions (by people who literally did not know what the code did) which overwhelmed various projects.
I was responding to the top-level article, which doesn't bring any evidence to the public so that we can examine it. I do appreciate the dearth of evidence in your position too, but it's not connected to the article's claims about vibecoding for businesses.
all of the evidence
It's one thing to be against something, another to just keep repeating how something doesn't work while there's an endless stream of open projects you can look at yourself. At this point your position is demonstrably untrue.
Just in case, there you go, a series of open reimplementations of Apple's proprietary tools that a few projects rely on now, almost completely vibed: https://github.com/viraptor/actool/ https://codeberg.org/viraptor/re-derq https://codeberg.org/viraptor/re-intentbuilderc https://codeberg.org/viraptor/re-appintentsmetadataprocessor https://codeberg.org/viraptor/re-xcstringstool
You aren't mentioned in the article either.
Moreover, I'm not quite convinced by your statement. You say that you vibecoded all five repositories, but only two have prominent vibecoding artifacts. You also say that a few projects rely on them, but I can only find them in nixpkgs as packaged ports along with your plans for using them later once they mature. Also, your actool doesn't have a proper license file and it likely cannot be copyrighted by you since you didn't write it; neither you nor Claude have the authority to release it as MIT. Finally, do you really think your prompt is the future of software engineering, or is it a combination of useful notes and manager-brained attempts to command the bot?
The linked CLAUDE.md is itself clearly LLM generated. So the original seed of human thought here is somewhere else.
This strikes me as the heart of the argument, and entirely accurate:
The enthusiasts are not wrong. We are starting to see real, non-imaginary, discontinuous leaps in capabilities from teams that lean in hard to working with AI. And this does not feel like a normal technology cycle where you can wait for the dust to settle; teams that sit this out while competitors are hustling could be out of business before the dust settles. That’s a real, existential threat.
The skeptics are also not wrong. When you ship code faster than engineers can read it, in domains where nobody has full context, you are making withdrawals from a trust account that took years to build. Reliability degrades, institutional knowledge evaporates. You end up with systems nobody understands, products burbling into incoherence, and on-call rotations that grind people up and spit them out. That is ALSO a real existential threat.
is there any evidence for the first claim? is there any concrete reason to believe that, at an industrial level, firms that are taking these tools up wholesale are smoking their competitors who fail to do so?
I'd like to pose the same question for:
In all of these cases there has never (to my knowledge) been a 100% indisputable case-study, because every organization is different and measuring developer productivity / output quality / etc is still effectively an unsolved problem, after 50+ years of trying to figure it out.
... but that aside, given that coding agents only got really good in November 2025 and most firms didn't clock onto that until at least February (individual developers were figuring it out in December and January) there's not enough time for anyone to comprehensively smoke their competition yet.
UPDATE: Came back to say I owe you a better answer than that. My best answer is that those of us who are living in these tools on a daily basis can see how much leverage they give us.
I used to be able to get maybe one significant feature achieved in a good day of work. Armed with Claude Code or Codex I'm now getting the same or higher quality of development work done 2-6 times in a day.
I'm also building projects and improvements and tools that I would never have built in the past because they weren't worth the estimated time investment.
This is a two-edged sword: it's even easier to get distracted by new ideas now, so it requires significant discipline to stay focused on the most important thing. That's one of many places where non-technical skills and collaboration with others remain critically important.
Like Charity discussed in her piece, there's a missing shared reality here. The question of whether this is actually a useful productivity boost or not is really hard to engage with when you've seen it with your own eye on a daily basis for months (or in my case years).
thank you, Simon, i appreciate your willingness to continue to engage in good faith on what is undeniably a contentious topic.
i am very open to the idea that some people will work better on some problems in some organizations with these tools, perhaps like agile et al.
i continue to feel that there is not a reasonable precedent or motivation for the chorus, mostly of upper and mid level managers (at least based on how i personally see this play out offline), that insists that these tools have fundamentally changed the nature of our work, or that organizations must (or even should) move NOW to seize this opportunity lest they be consigned to obsoletion.
i don’t think i see you generally making that level of claim, but i felt it echoed in the first paragraph you quoted.
I am negative or do not think there is evidence on any ways for most of these. As in, they do not affect the outcome!
We have pretty good negative evidence about LLMs.
So... I think your argument here has no fangs
My best answer is that those of us who are living in these tools on a daily basis can see how much leverage they give us.
Are you concerned at all about the apparent bias that causes developers to report that they're more productive, even when they're less productive? e.g. found in the METR study. Yes, those were 2025 era models - but regardless of how good they were, experienced developers, working in their own space, thought they had helped them, but they objectively worked slower.
For all the claims that the tools got "really good" at a specific point and that users can "see" how good they are, how do know you're not falling victim to this effect?
Side note: how do you square
coding agents only got really good in November 2025
with
when you've seen it with your own eye on a daily basis for months (or in my case years)
? What exactly have you seen for years if coding agents weren't good yet?
No, I'm not concerned about the METR study. I've read it and taken that bias into account with my own self-evaluation.
Coding agents got good enough to be reliable daily drivers in November. The category really only came into being in February 2025 with the release of Claude Code, and it wasn't until November that they got reliable enough to be mostly left to their own devices rather than watching them like a hawk.
... but I've been experimenting with LLMs for coding a whole lot longer than that. I started writing about AI-assisted programming back in 2022 and I've been figuring out patterns to get value out of them consistently since then.
The coding agents thing was the point where they stopped being an occasional booster and started being more of a clear productivity multiplier.
Really my question comes down to how you know, given evidence that your own self evaluation might not be correct. Being aware of the effect and having experimented for a long time doesn't really answer that, and you're then repeating that LLMs are good and have gotten better, which again I have no real way to evaluate other than my own experience.
It seems to me that to land on a shared reality we need to find objective and evidence based ways to talk about the effects of LLMs. Here's a great list of common fallacies in this space, and developers saying they feel more productive is one of them.
I'll need a lot more than that one METR study to counter-act my own years of lived experience.
Let's try for some numbers. I ran this prompt in Codex in my dev/ folder, where I keep all of my GitHub projects checked out:
Write a script here in Python that uses
subprocessandpathlibto:
- Find all folders and nested sub-folders with a
.gitfolder in them (so all checked out git repos)- Filter for just the ones that have a git remote on GitHub which is either
simonw/ordatasette/- Extract all commits from those - just the first line and the commit date - and write those to a SQLite database
The database table needs to track GitHub repo identifier (e.g.
simonw/datasette) and commit hash and commit date and first line - dedupe on commit hash (which will solve for duplicate checkouts)Only commits to
mainfor the moment.
Here's the 200 line Python script it wrote. I ran it, and the output was:
Done: 624 repos seen, 509 matching GitHub repos, 481 with main, 46353 commits seen, 30843 inserted into /Users/simon/Dropbox/dev/github_commits.db
Then I fired up the agent I've been working on for the past few months against the database:
uvx --with datasette-agent datasette -s plugins.datasette-llm.default_model gpt-5.5 github_commits.db --root
And asked that:
Show me a table of commits per month
It wrote:
SELECT strftime('%Y-%m', commit_date) AS month, COUNT(*) AS commits
FROM commits
GROUP BY month
ORDER BY month
Here's the result exported into Google Sheets to chart. This year I'm at around 600-900 commits per month. In previous years it was significantly lower. (CORRECTION: see next comment, I hadn't filtered out commits by other people to my repos so the number is actually more like 300-500 per year.)
Greg Wilson's article you linked actually makes the same point I made in my earlier comment:
Note: this post is about how people are assessing AI, not at LLM-assisted coding itself; with a little rewording, these criticisms could be applied to a lot of the claims that have been made about agile development, test-driven development, and other practices.
And to counter the section in Greg's post about counting commits:
In 2023, McKinsey proposed measuring individual developer productivity using counts of commits, pull requests, code reviews, and similar activities [McKinsey2023]. Goodhart’s Law states that when a measure becomes a target, it ceases to be a good measure [Goodhart1984]. When developers know their commit count is tracked, they make more, smaller commits; when ticket counts are tracked, tickets get split. The numbers improve while the underlying work does not [Beck2023]. Activity is not output; output is not value.
If commit counts are a target, they'll get gamed. I haven't thought about my commit count at all until just now, when I figured it might be a useful metric to look at to see if it could illustrate changes in my rate of code produced over time.
modify that script so it also includes the number of lines added and number of lines removed along with each commit record
Then I realized I might have other people's commits in there too:
add the author email address as well
Here's the updated script. Datasette Agent wrote me a new query:
SELECT
strftime('%Y-%m', commit_date) AS month,
COUNT(*) AS commits,
SUM(lines_added) AS total_added,
SUM(lines_removed) AS total_removed,
SUM(lines_added) - SUM(lines_removed) AS net_lines,
SUM(lines_added) + SUM(lines_removed) AS total_churn
FROM commits
WHERE author_email = 'my-email@gmail.com'
GROUP BY month
ORDER BY month;
This gave me one month which was a colossal outlier on the net_lines measure - I told Datasette Agent "figure out what happened in that peak month" and it tracked down this enormous commit: https://github.com/simonw/tools/commit/32e45c390f087a0fac28ba51b8b020c653624b4c - so I filtered that out and ran the report again, giving these results.
... and at this point I'm going to stop.
Hah, trying to derive code productivity from Git repos really is fraught with gotchas. I just spotted that one month I apparently deleted way more code than I created... but it was because this commit here removed a large vendored library from one of my projects.
That's where the point about agile, microservices, etc comes in - there may simply be no objective way to measure this, even though it'd be ideal to have one.
It has always been a little unfortunate that there is so little actual good rigorous research on this stuff. Roughly every couple months I do a review of the literature on productivity impacts from coding agents and only just now are we getting data from 2025 on the subject and it's heavily mixed.
It doesn't help that the problem is enormously complex. The impact a coding agent has on an individual working solo on a project they own is dramatically different than the impact of an individual working on a team of people. The impact of an agent on greenfield projects is dramatically different than the impact on brownfield projects. The quality of the software over time requires months or years of continuous development to be able measure anything useful. We don't really have good objective measures of quality that are amenable to smaller studies with a control.
I use the tools and sometimes I myself can't tell if the impression of speed is purely because the gains I all frontloaded at the start and then I lost it all again at the end. I felt more velocity with a much less clear actual overall delivery gain.
I think the argument is not that firms who adopted AI "ARE smoking their competitors", but that they "WILL smoke their competitors".
Even if current LLM benefits are modest, if we assume 1) the models/harnesses will continue to improve, and 2) it takes time to change your processes/culture/etc to receive the benefits, then you have a situation where a firm who prepares now will outperform those that didn't in the future.
It's like any other bet on future tech. Are they a buggy carriage manufacturer adapting to build for cars, or are they a sucker going all-in on Betamax/Laserdisc/etc?
At this point, I don't think AI itself (not the effects social hype or VC money, only use of the actual tech) will be particularly revolutionary for great benefit or harm. Like the article says, it seems only to accelerate what was already present in the first place. Good engineering still requires good engineering and good checks, no matter where you start from. Regardless of where the tech goes, I don't think that will ever change, and teams/companies who are good at that will stick around because they'll be needed.