Don't fall into the anti-AI hype
131 points by mitsuhiko
131 points by mitsuhiko
Yes, maybe you think that you worked so hard to learn coding, and now machines are doing it for you. But what was the fire inside you, when you coded till night to see your project working? It was building. And now you can build more and better, if you find your way to use AI effectively. The fun is still there, untouched.
I don't think this engaged with one part of the discourse which to me is the most unsettling one? LLMs and companies producing them are completely devoid of any moral and responsibility.
Arch Linux has been actively DDoSed by companies feeding their LLM model training.
Wikipedia has been DDoSed by companies feeding their LLM model training.
The internet is being DDoSed by companies feeding their LLM model training.
Uncritically supporting, using and promoting tools by AI companies are detrimental to the open and free internet and the FOSS project you (generic you) claim to support.
We can't fundamentally block these companies, they are hiding behind the entire subnet of Azure/GCP/AWS/AlibabaCloud which would be detrimental to the users we have on these platforms. We have instead had to spend hundreds of volunteer hours to fend of companies to keep things like the Archwiki working for our users.
That's the part that closest to you (independent web ops) but it goes far beyond just that. The "AI" companies are DDoSing reality itself.
They have massive demand for new electricity, land, water and hardware to expand datacenters more massively and suddenly than ever before, DDoSing all these supplies. Their products make it easy to flood what used to be "the information superhighway" with slop, so their customers DDoS everyone's attention. Also bosses get to "automate away" any jobs where the person's output can be acceptably replaced by slop. These companies are the most loyal and fervent sponsors of the new wave of global fascism, with literal front seats at the Trump administration in the US. They are very happy about having their tools used for mass surveillance in service of state terrorism (ICE) and war crimes. That's the DDoS against everyone's human rights and against life itself.
Being a customer to these companies is directly funding all that, and in return you get… a chatbot that spits out plausible responses. Which, sure, can be useful —particularly in situations where you would demand average plausible work and nothing more— but that also comes with costs of its own. You turn creative work into uncreative checking, risking to atrophy your creative skills if you do it too much. And you grow a dependency on these corporations. You give them control. Lots of control. Control over what the output can and can't be. Control over whether you can get the work done at all (unless you download one of their small local models they throw out specifically to try to argue against this, but those are always behind the models they keep to themselves). Accepting LLMs into your workflow means installing fascism as a dependency of yours.
And you grow a dependency on these corporations. You give them control. Lots of control. Control over what the output can and can't be. Control over whether you can get the work done at all (unless you download one of their small local models they throw out specifically to try to argue against this, but those are always behind the models they keep to themselves). Accepting LLMs into your workflow means installing fascism as a dependency of yours.
This is really well said. It sounds alarmist when framed so nakedly like this, but I think the reasoning holds, and if I examine my own unwillingness to use this tech in its current shape and form, it's ultimately this. It's certainly not out of respect for "copyright infringement", lol. In my own words, it's giving in to centralization of power.
Is there any possible development here that might convince you otherwise?
How about an LLM trained by a coalition of universities? Or a government funded dedicated non-profit? Or 100,000 nerds networking their GPUs together, Seti-at-home style?
Right now we have half a dozen Chinese AI labs competing to release absurdly effective models mostly under genuine open source licenses, but I have trust issues there myself because the training data remains closed and I've tried asking those models for their opinions on Taiwan.
I have no qualms about using LLMs locally. I was thinking about playing around with one of those soon. I'm mainly talking about making API requests to Google, Anthropic, or "Open"AI.
I doubt we are at the point yet where the downloadable models have pervasive astroturfing, political messaging, and general power grabs embedded into them, but I do think that is imminent. Well, maybe it has already begun, as you noted above.
LLM trained by coalition of universities sounds great. Government funded dedicated non-profit sounds great. 100,000 nerds networking their GPUs together also sounds great.
Also acceptable for producing trustworthy models: a lot more competition. Presumably there would be some way to use these models to detect the unwanted influences in each other.
One of the reasons I'm not yet too worried about malevolent intent baked into models is exactly that: there's really healthy competition right now.
I was much more worried about this at the tail end of 2023 when OpenAI were the only lab with a GPT-4 class model and had been since March of that year. It felt back then that maybe Sam Altman would get to control the personality and opinions of the LLMs used by everyone on the planet.
Then Claude happened, and Gemini, and the open models like Llama and Mistral, and then the Hugging Face explosion where thousands of fine-tuned variants of those open models showed up, and then the dramatic rise of open weight Chinese LLMs over the course of 2025.
(And Grok, but I don't trust that one's owner in the slightest.)
My current hope is that the competition helps keep people in check when it comes to deliberately trying to subvert the human population through models. Aside from the attempts to make Grok "less woke" and some geo-political stuff from the Chinese models I've not seen evidence of skullduggery yet.
One nice thing that I like about playing with the local ones - there's no possibility of rug pulling. If I happen to like a model, I can keep using it exactly as-is, for however long I choose to hold onto my copy of the weights and the software that runs it. I find that a nice peace-of-mind defense against the astroturfing, political messaging, and power grabs that you mentioned.
(On the other hand, now we're seeing the squeeze on the local hardware capable of running them...)
There are so many issues with LLMs that it would take books to explicate all of the problems. People have written such books, I have at least one on my bookshelf. We're never going to be able to hash them all out in Lobsters thread. If I wrote such a book, it would look like Greta Thunberg's "The Climate Book", with a few pages for each of the zillion issues. No one fix will make LLMs ethical, they are fractally evil; they look increasingly worse the more closely I look at them.
But for me the biggest problem with LLMs is their size. As long as they are "Large", their need to regularly attempt to ingest all of humanity's information will never cease, people will never be like "we ingested the Internet in 2025, we can stop now". Their size dictates the power draw of creating them and running them, and if they remain as ubiquitous as AI proponents want to make them, they can only increase the energy use, pollution and global heating of the technology industry, at a time when reducing the emissions from every industry (including tech) is crucial to our survival. The climate crisis is not the only crisis we face, but it's the one we must address on an increasingly tight timeline. Anyone promoting LLMs is engaging in climate denial.
(Back in the days of the "inevitable" Metaverse, I feared the tech that would balloon the internet's emissions was Virtual Reality. But we've been consistently increasing the emissions of the internet for some time: https://solar.lowtechmagazine.com/2015/10/why-we-need-a-speed-limit-for-the-internet/ )
I hear AI boosters regularly argue that LLMs could theoretically be made more energy efficient, and therefore less polluting, so we shouldn't worry about the climate impact. (Nevermind that in practice, we can see that LLMs are increasing emissions, as all of the major tech companies have abandoned their climate goals because of LLMs. Google, Microsoft, Amazon.) Sadly, most people are not familiar with the Jevons Paradox, which states that making technology more efficient makes it cheaper to use, so people use it more, canceling out any resource savings from efficiency. More efficient technology cannot reduce our emissions, unless it is paired with a program of using technology less, i.e. degrowth. We must have both more efficient tech and not increase our use of it, to avoid the Jevons Paradox. I'm not sure that's possible under modern capitalism.
So what development would it take for me to consider that LLMs may not be inherently fascist, given fascism's connection to climate denial? An emergency crash program to greatly reduce the energy use and emissions of LLMs, alongside a program to greatly reduce their use. I think the only approach that could work is eliminating LLMs in favor of Small Language Models. Large is too big.
do you also add a disclaimer to your posts about western llms regarding their stance on israel, or are you fine with propaganda when it's your guys doing it?
unless you download one of their small local models they throw out specifically to try to argue against this, but those are always behind the models they keep to themselves
It looks like the most capable downloadable models are from labs that do not always have a stronger closed model. Of course, running locally the most capable models one can download is not often feasible.
(I do use some local models up to 30B but avoid non-local models; not even no-charge ones, I don't want to help them reduce how much they lie abouy monthly users)
You give them control. Lots of control. Control over what the output can and can't be. Control over whether you can get the work done at all
And if you are mentioning authoritarism anyway, another relevant thing is — all your data is subpoenable without any notification to you, and directly sellable (in the pre-meatgrinder form, not just in the sense of further training) with any enforcement about illegality of such sale unlikely.
Of course, many points that you make are just as true and relevant about GMail/GCP/Google search, or MS 365/Azure/Bing, or AWS, or at least sometimes Cloudflare.
I’m curious to understand how the Arch wiki and Wikipedia are getting “DDoSed” by AI. This claim keeps popping up, always heavily dramatized but without any numbers to back it up. I’m assuming that the LLM is just… browsing those websites? And downloading them? And it’s doing so at most once every long while to train its models? Where’s the DDoS? How is this different from a bunch of people keeping an offline copy of Wikipedia for later perusal? There’s even a tool on GitHub that lets you easily do this, and many people use it.
always heavily dramatized but without any numbers to back it up.
There are plenty numbers going around and there are cross-collaboration projects between projects to help share knowledge. If this wasn't a problem, why is Anubis popular and solving a real problem?
I’m assuming that the LLM is just… browsing those websites? And downloading them? And it’s doing so at most once every long while to train its models? Where’s the DDoS? How is this different from a bunch of people keeping an offline copy of Wikipedia for later perusal? There’s even a tool on GitHub that lets you easily do this, and many people use it.
If it was that simple it would be easy to fend off. But no, these companies are way more vile then just hitting a website every once in a while.
What is happening is that companies like OpenAI are spinning up hundreds of machines to train their LLM models. Storing petabytes of websites is expensive, and as these companies are funded by the cloud flatforms bandwidth is effectively free from them, they instead have each of these hundred machines crawl the open web and ignore robots.txt in the process. We see hundreds of machines doing hundreds of requests pr seconds from just one of the cloud subnets. Multiply this for each AS and you have a large scale DDoS we can't do anything about.
We can't block them. We can't add URLs for our resource intensive URIs because they do not read robots.txt.
Our gitlab has been hit by a these subnets scraping our copy of the Linux kernel upstream, effectively preventing us from packaging.
I'll gladly ask around for our internal dashboard screenshots when this started becoming a problem after summer, but there are enough information out there that dismissing it as "over dramatized" is just being ignorant at this point.
Aaah, so they don’t bother caching it because it’s expensive, and since there’s no caching they can’t pool access to it from their presumably hundreds of training bots, so that leads to a DDoS. Got it!
Where can I find these statistics? Maybe I am indeed ignorant!
Also, sure, maybe the lack of caching saves money, but isn’t there an obvious performance hit? I’d want my training to be fast, and hitting some NVMe cache sounds more appealing than calling the same things down over fiber all the time…
Where can I find these statistics? Maybe I am indeed ignorant!
I'll ask around for screenshots I can share from Arch. But I'm sure you can find other projects posting stats on mastodon and other relevant media from half a year ago.
Also, sure, maybe the lack of caching saves money, but isn’t there an obvious performance hit? I’d want my training to be fast, and hitting some NVMe cache sounds more appealing than calling the same things down over fiber all the time…
I have no clue about the cost/benefit analysis they are doing for this. For all I know it's 100 different companies with one machine each. Or 100 different model training projects from OpenAI having 1 machine each. There is not enough information on our end to completely discern why there are so many different subnets crawling us at once.
Aaah, so they don’t bother caching it because it’s expensive
Caching is cheap and everybody does it. I doubt that a lot of scraping is happenign for training, it's more likely that this comes from people's agentic automations? But I would love to know.
There's certainly a proliferation of appallingly badly behaved crawlers right now. I'm thinking about removing some of the default HTML-visible pagination features from Datasette because increasingly it looks like any instance with 10,000+ rows is getting crawled to DEATH right now.
Most of these crawlers don't include an identifiable user-agent, which makes me suspect that they're not from the major name-brand AI labs (which tend to have documentation as to the crawler agents they are using). I'd like to see credible theories as to what these crawlers are doing - there are lots of vague "it's obviously AI training!" theories, but I want to know WHO is doing this and WHY.
Caching is cheap and everybody does it. I doubt that a lot of scraping is happenign for training, it's more likely that this comes from people's agentic automations? But I would love to know.
Requests from OpenAI and/or agentic automations and/or AI products identify themselves through the UA. The LLM scrappers we are fending off does not.
I have my own agentic automations that run on my machine and they definitely don't identify themselves. They are just browsing the web with whatever they built. From Chromium, to http libraries in python etc.
That is bad.
The OpenAI webui does use their own UA and at work (national public broadcaster) we do see UA identification from AI companies and their products. But I have not looked into how the agentic stuff (I have no interest in this) access webpages.
But I believe this is from your residential IP so not super relevant to the LLM scrapper issue we are seeing?
I’m not sure what you are seeing but my bet is most scraping people with active websites see would be from agents acting on command of humans and not AI companies for training.
I’m coming from residential and cloud IPs depending on what I’m doing and I think most people are.
I'm confident it's not regular people commanding agents because agents generally won't crawl EVERY page on a site... and then do it again a few minutes later.
I'm not suggesting this is all in response to an end user having one query, but things that were built with agents to the service of something a human wanted and not foundation AI labs. I don't have a ton of traffic logs to look at at the moment but the non foundation lab crawling traffic is multiples of foundation labs. Particularly when you create a honeypot, provision a SSL certificate, within seconds you have crawlers coming your way and some of them go hard. Their purpose whatever it is, is not AI training.
After I saw how quickly they come from transparency logs, I created a dev server with an obvious sql injection on the login and sure enough, within a day I got mail by someone about it. That was not a human, that was automated security research by agent.
I’m not sure what you are seeing but my bet is most scraping people with active websites see would be from agents acting on command of humans and not AI companies for training.
This can't be true.
Yeah, I agree in that I don’t immediately see how caching is more expensive and less performant than hammering foreign servers, and would also appreciate more insight on this.
The OpenAI crawler is documented here, with the User-Agents for different purposes, IP ranges, and robots.txt behavior: https://platform.openai.com/docs/bots. And for Anthropic: https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
Are you saying OpenAI, specifically, is ignoring robots.txt, or using it as a stand-in for the whole industry?
That said, it would not surprise me if there’s a bunch of startups out there who wrote badly behaved crawlers, but IMO would be clearer to treat that as the problem.
I have all 3 of OpenAI's user agent strings in my robots.txt with a blanket Disallow: /.
Yesterday OAI-SearchBot requested robots.txt 1600 times and other URLs 4 times. GPTBot didn't request robots.txt but requested 32k URLs. ChatGPT-Bot also didn't request robots.txt but requested 9 URLs (but they may have been user-initiated rather than crawled.)
(Over all AI scrapers, they had 643k hits out of 663k on my site yesterday. Yes, they are ignoring robots.txt!)
Whoa!
Did you confirm that the IP address came from OpenAI's published ranges? You can find those at:
If you had 643k hits from those IP ranges despite blocking them in robots.txt that's a genuine scandal and deserves to be amplified.
Minor clarification: GPTBot was only 31k hits - the 643k hits were from all AI bot user-agents combined.
And yes, all the GPTBot ones are from their IP ranges.
zfgrep GPTBot /var/log/nginx/rjp.is.access.log.1.gz | cut -d' ' -f1 | sort | uniq -c
3453 74.7.227.163
168 74.7.227.63
4726 74.7.241.19
9156 74.7.241.27
953 74.7.241.7
4115 74.7.241.9
8381 74.7.243.203
139 74.7.243.226
23 74.7.243.239
Despite this being in robots.txt:
User-agent: GPTBot
Disallow: /
That's disgraceful, especially since GPTBot is the training crawler - so they can't even claim it's acting because a user directed it to during an interactive chat session (which still shouldn't override robots.txt IMO).
I noticed you have a bunch of subdomains, is this log file for all of those too?
https://social.rjp.is/robots.txt is missing the / from the Disallow: line, could that explain this?
According to the truly terrible spec at https://www.robotstxt.org/robotstxt.html omitting the / means "allow all robots" - I asked Claude and it didn't know that so it's very likely to be inconsistently implemented all over the place.
I noticed you have a bunch of subdomains, is this log file for all of those too?
That log file covers rjp.is, mangane.rjp.is, mta-sts.rjp.is and padlocks.rjp.is but the three subdomains had no hits yesterday.
missing the / from the Disallow: line, could that explain this?
Different log file for that site and the AI bots don't seem to even hit it anyway.
In summary: the AI bots don't seem to respect robots.txt
Are you saying OpenAI, specifically, is ignoring robots.txt, or using it as a stand-in for the whole industry?
Both. I have no faith in any AI companies operating in good faith.
I selfhost two podcasts and some chinese AI gobblers found my feed, eating through 4 years of hosting budget in a week by redownloading the entire catalogue over and over again.
You can read about the troubles this very site is having with scrapers:
This is not the only instance of a site/"app" with a bad behaviour. See searx exception in lobsters source code.
There's also the Rack::Attack configuration, and the Caddyfile includes which cover a lot of extra behaviors we've seen.
I was on a SXSW panel with Wikipedia and Creative Commons.
There is absolutely an increased load and that results in infrastructure costs and more pages and production incidents. The problem is scale and care. Wikipedia has data dumps and lots of ways to consume it in mass that are reasonably respectful, but the rise of both people training models that scrape without respecting robots txt and models who query data at scale or users that ask for some information and the model writes a bash script to curl hundreds of sites to find it...the end result is more load from more places. And the load is coming from human channels while plenty of non-human channels are being ignored or intentionally subverted.
The WMF has posted an objective related to mitigating the increase of infrastructure use due to AI-related scrapers: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2025-2026/Product_%26_Technology_OKRs#:~:text=Responsible%20Use%20of%20Infrastructure
It dates back from November, so not the most up to date, but not completely obsolete either.
Our content feeds into search engines, social media platforms, ecommerce, and ever since the rise of AI, is used to train large machine learning models. Consumers source data by scraping pages, using the APIs, and downloading content – commonly without attribution. In the world of unauthenticated traffic we can’t reliably differentiate one user from another, which greatly limits our ability to enable and enforce responsible use of our infrastructure
My panel was last March which is ages ago in AI time.
Here's a thing I wrote about the panel that I think is still relevant:
Anna from Creative Commons laid out a great framework that I feel summarizes the things I care about:
Here's what that means:
Many conversations around AI focus on one of these, but all three are essential. I have ideas for moving the needle on some of them, but we need more engagement from the industry. Zero audience members from AI companies came to our talk and asked, "I agree there are problems; how can we solve them together?" It's not entirely up to us (you, me, the panel) to solve this. We must have engagement from the entities driving this tech. Let's ask them: "How are you thinking about reciprocity, attribution, and equity between your company and the open ecosystem?"
https://shithub.us is serving about 200 QPS, and topping a terabyte of monthly web traffic. Do you think there's enough organic interest in Plan 9 code to justify that?
I don't know about Wikipedia, but I was helping a bunch of my customers withstand the barrage of (probably) LLM-related DDoSes. You'll have them hitting heavy endpoints (e.g. search) over and over, a botnet of residential IP addresses from all over the world with randomized User-Agents. They'd often do this seemly without any logic to it too, bombarding the services long after it's all been Anubis'd.
Very different from someone mirroring a website for personal use, and clearly with malicious intent.
We don't really have numbers, but with the CHICKEN Scheme project (certainly a niche programming language if anything), we had this problem as well whereas this was never a problem in the past. See this post from our tireless server admin Mario on what he had to do to keep things manageable. You can clearly see the crawlers are hitting endpoints that normally are completely unused - any sane web crawler would not re-request the same old revisions in a git/svn repo over and over and over. It also includes some links to other posts of more famous people and projects.
FWIW, I play DCSS online (a roguelike of the nethack variety) and servers tend to keep morgues and logs available, running on a shoestring budget. In the last year many became overloaded by AI crawlers.
LLMs and companies producing them are completely devoid of any moral and responsibility.
Like most other companies all the time since forever? Where's the news? Seriously, DDoSing couple of websites is miniscule comparing to industries that killed and/or are killing people in millions.
Yes, lots (most?) of economical activity has bad externalities and especially publicly traded companies are absolutely a/im-moral. The economical activity often has a tremendous benefits, but also often terrble costs, and oftentimes these are not equally distributed. But this requires political solutions not rejecting technology as a whole.
Like most other companies all the time since forever? Where's the news? Seriously, DDoSing couple of websites is miniscule comparing to industries that killed and/or are killing people in millions.
Is this an argument against caring about this problem, or are you trying to say I'm caring about the wrong thing?
But this requires political solutions not rejecting technology as a whole.
How is rejecting the technology not a political solution?
I don't think this engaged with one part of the discourse which to me is the most unsettling one? LLMs and companies producing them are completely devoid of any moral and responsibility.
Companies, yes, but LLMs are just data files that are bundles of knowledge. Data can't have morals and responsibility.
In a certain way, if this all remains as successful as it has been, then it would freeze computing forever, rather than changing it. That's because all the training and the generation have been on the languages and computing systems that are popular in the 2020s. And if that is what is used to generate code that then becomes the next set of training data, I think you know what happens.
I find it quite odd to think that the super advanced AI systems of the future are churning out new ways to interface with teletype systems and Unix, indefinitely.
This is a general concern I have with the LLM/generative part of things: it's a freezing of the status quo, and in it's most successful form that freeze lasts forever
I agree. And while it's bad in tech I think it's catastrophic in ethics and other societal topics. Take what was okay 80 years ago, 150, 300. How things that are now considered cruel, torture, beyond comprehension used to be good or at least just.
Something I like to do with chatbots since way before LLMs is getting into philosophical and ethical questions. In LLMs when they don't outright reject any thought experiment (even the classics, like ones around extreme interpretations utilitarianism) they all are extremely conservative (in that non-political sense of the word). Not sure if that's intentional, or a side effect of training, but a common theme is that they are essentially anti-thought-provoking on such topics. It feels like if times were different (luckily they aren't) they'd probably consider ideas like abolishing capital punishment, slavery torture or the right to fair trial, or legalization of homosexuality extreme views.
It's scary.
AlphaGo was only trained against its own output, yet developed novel-to-humanity strategies and became superhuman.
How does your belief explain that this happened rather than its strategy becoming moribund?
AlphaGo was designed to learn how to win the game playing against itself. LLMs are designed to produce the most likely sequence of tokens. They don’t have a learning loop
They don’t have a learning loop
That's not really true as-of the start of 2025.
The big advances in LLMs in 2025 came from "Reinforcement Learning from Verifiable Rewards" - also known as "reasoning". Andrej Karpathy has a good explanation of that here: https://karpathy.bearblog.dev/year-in-review-2025/
The reason they're so much better at code today than they were ~12 months ago is that the labs found ways to run Reinforcement Learning loops where they generate vast amounts of code, that code is then verified (compiled, tests run etc) and the agents get rewarded for good code.
And on the individual machine level, I've been finding TDD loops are astonishingly effective with the latest generation of models. You can consider that a learning loop - I even sometimes have them write notes about what they learned at the end - although aside from those notes the slate gets cleared every time you start a new session.
Surely this can't be done for natural language though, can it? The rewards there aren't separately verifiable.
Right - I think that helps explain why computer programmers are generally getting much more impressive results from this stuff than other fields of knowledge work.
It seems to work for math too, which is amusing given "LLMs can't even multiply!" was a truism for quite a long time.
The open question is if someone can figure out an RL loop for law or medicine or other fields. The labs are certainly trying! Check out this Anthropic job ad: https://www.anthropic.com/careers/jobs/4924308008
It seems to work for math too, which is amusing given "LLMs can't even multiply!" was a truism for quite a long time.
This is a pretty dramatic mischaracterisation of what’s happening in maths. LLMs are not “doing” maths, we had a bunch of breakthroughs in new non-LLM (but often ML) tools. Where the LLMs come in is writing scripts to orchestrate these tools, which much more resembles programming.
The primary utility of LLMs in this space is helping mathematicians unfamiliar with programming environments and workflows navigate these tools
The gold medal IMO performances this year were produced by LLMs that were not driving tools.
From Google’s docs:
Gemini 3 Pro is a sparse mixture-of-experts (MoE) (…) transformer-based model (..,) with native multimodal support for text, vision, and audio inputs. Sparse MoE models activate a subset of model parameters per input token by learning to dynamically route tokens to a subset of parameters (experts); this allows them to decouple total model capacity from computation and serving cost per token. Developments to the model architecture contribute to the significantly improved performance from previous model families.
I read that as: “they baked a bunch of non-language ML models, in with an LLM that acts as an interface to them”. I would contend most of the interesting progress from the previous year is in the non-LLM parts, but this is obviously in the small print because the purpose of this project was to hype the capabilities of LLMs specifically
No, that's not what Mixture-of-Experts means (common misconception though).
MoE is an LLM architecture used by most of the recent models which means that even if you have 200bn parameters in the total model each round of inference only uses a subset - e.g. 20bn - of those. Previous model architectures would evaluate every token against all 200bn (a lot of matrix arithmetic) - evaluating against 20bn lets you run inference a lot faster.
Each "expert" is just a big opaque blob of numbers - there isn't one expert that knows math and another that knows biology, for example.
The model they used in IMO was just a model, not a model + a whole bunch of extra custom models and tools.
The contention is not that it’s not an ML model, it’s that it’s not just a language model.
AlphaGo is also an opaque blob of numbers, but it’s ridiculous to go off that to say an LLM can beat the world champion at Go
From lower down in the same document:
Gemini 3 Pro is trained using reinforcement learning techniques that can leverage multi-step reasoning, problem-solving and theorem-proving data
If you don’t see how much more this resembles tool use and AlphaGo than an LLM “doing maths” it’s hard to take your advice as anything other than boosterism
I categorize that as a "reasoning LLM". The architecture described there for Gemini 3 Pro is the same as that used by GPT-5 and Claude 4+ and may of the open weights models I run on my own laptop these days - they're all big binary blobs full of matrices once you get to inference time.
Here's a Qwen "thinking" model that's less than 2.5GB and runs happily on a Mac https://huggingface.co/mlx-community/Qwen3-4B-Thinking-2507-4bit/tree/main - sometimes you might catch it reasoning though a problem in Chinese!
I understand that it’s marketed that way, but I think you’re missing my point.
The reason this model can “do maths” is that it (or a subset of it) was trained specifically for that purpose in a way similar to AlphaGo, rather than an emergent effect of absorbing a large training dataset, which is how it’s presented.
The reason Google presents it that way, is that they would like me to think “hey, if it’s smarter than me at this specific task, it must be smarter than me at everything” which is definitely not true, but would be true if it was an emergent property.
This is important to realise because it informs your mental model of what these tools can do and how you should use them. I personally think the language part is an illusion, and the most useful part is the more specific tool. There’s a reason people didn’t freak out about AlphaFold the same way they’re doing about LLMs, despite it being (in my opinion) a more interesting result.
Are these actually novel problems, though? Would a good mathematician have trouble solving them given a list of other problems these were inspired by?
I am no "AI-sceptic", I just think that reasoning is still not their strong suite. To date, probably this is my favorite LLM description though: https://news.ycombinator.com/item?id=46561537
The IMO is a high school math competition - albeit a prestigious international one - so yes good, mathematicians should be able to solve them. The problems are only novel in that they were created specifically for the competition, which means there shouldn't be exact duplicates in previous training data.
It was still a notable result because it undermined the "LLMs can't do math" AND the "LLMs can only solve problems already in their training data" memes.
so yes good, mathematicians should be able to solve them.
The only thing I'll add here as someone with a math undegrad degree, and a fair bit of competitive math experience (but a fair bit below the IMO level) is that competition math often has quirks, tricks or ideas that wouldn't be interesting or relevant to a "good mathematician." I would say something like IMO:Math::Advent of Code:Programming is a pretty reasonable comparison. In fact I would say my competitive math experience is more relevant to AOC problems than my programming experience.
I am not qualified to characterize or opine on this, but here is some discussion on LLMs’ participation with specialists doing math, specifically Erdos problem #728 — perhaps relevant to the discussion or food for thought: https://mathstodon.xyz/@tao/115855840223258103.
I largely agree with Tao’s perspective here, but note he does not call the models LLMs.
The thing that’s important to realise is that these results did not come from language inference (which is what Google and OpenAI would like you to think). Without that you will be greatly mislead in extrapolating their capabilities
Yeah in that case it was way more than just LLMs being used, there was Lean and Aristotle and AlphaProof in the mix too.
That's different from the DeepMind and OpenAI results in the 2025 IMO.
Self-play analogies in LLMs, there are some articles on arXiv trying to do this (and given that it is an unthrottlable way to get training data, probably some companies do this too nowadays): inventing fake programming contest problems, complete with examples and tests, then trying to solve and checking if the tests pass.
Indeed as the root comment says, this is not oriented towards inventing approaches, and new approaches are hard to explain to current LLMs without training data (and self-play relies on some data being there — in Go throwing moves mostly at random at a legality filter still can be a bit more or a bit less at random, in programming there is often a cut-off for making sense at all)
This assumes that everyone uses AI for everything, though, and especially innovation. I don’t know if that’s the case; it’s clear even today that AI is more suited for tasks that have already been done a ton of times before by other people, and for which it hence has a ton of training data.
I don't think it would need to be used for everything to down out progress. Eg. take programming languages. We already see how LLMs work best with currently over-hyped programming languages such as Python. We also already see that LLMs struggle with newer languages, sometimes completely messing up even sticking with the language.
Now even if developers only use LLMs for some parts this will counteract adoption of new languages which already is something that is hard to achieve when there are still fewer libraries and so on.
It might not put it to zero but it certainly hinders adoption. Same would be eg for new operating systems, and many things outside of tech.
If LLMs are something you use to use other tools (languages, or through MCPs, etc.) then it becomes harder to establish newer tools. Especially when writing is also done with the help of LLMs leading to that being less likely integrated into models.
I'm fairly certain we'll see stagnation due to this. This happened in other areas that were less "set in stone". Think about market dominance of Microsoft products (IE, etc.) slowing innovation until it was overcome.
Now this (hopefully) isn't absolute but it certainly is another hurdle.
For what it's worth, I've found LLMs to work much better with Julia than Python (I use both at work.) There's a lot less Julia code out there, but the proportion of good code is probably higher.
I feel that it was true for the mass-users even before LLMs. Majority of developers who I was hiring were quite hostile to the idea of using non-standard tools. And people who care about craft used whatever was fun and they (we) still do.
I use quite a lot of agentic coding, but I still write other code manually because that’s good for my brain and brings me joy. And I understand that manual coding is considered to be a privilege by a lot of people just as using anything except react is/was (I don’t agree. I think It’s cultural)
I find myself largely agreeing with this piece, though as someone on the left, it's a confusing time to navigate. What strikes me most is the point about individual action versus political solutions. The downsides of AI—job displacement, wealth concentration, power asymmetries—are deeply political problems, and I'm not sure personal boycotts or refusals can meaningfully address them. Too many people have already tasted the benefits, and alternatives abound; the river has been crossed.
What we're left with, I think, is using these tools while pushing for political means to mitigate the harms. Labor protections, redistributive policies, antitrust enforcement—these feel like the actual battleground, not whether I personally use Claude or Copilot. I don't say this with any certainty, just as someone trying to figure out what a coherent position even looks like right now.
The downsides of AI—job displacement, wealth concentration, power asymmetries—are deeply political problems, and I'm not sure personal boycotts or refusals can meaningfully address them.
They are political problems, but they're not downsides of AI. They are the point of AI. It is no more a tool for you to use than automated cotton looms were a tool for weavers. I think they also generally had the right idea of what to do about it.
[AI] is no more a tool for you to use than automated cotton looms were a tool for weavers. I think they also generally had the right idea of what to do about it.
Weaving used to devour huge amounts of labour that was not available for other uses. In the long run, mankind is better off with fewer weavers and more software developers, artists, car mechanics and so forth. In the short run, yeah, the transition was painful.
The same is true of the transition from farms. It used to be that very nearly everyone was engaged in near-subsistence farming; now almost no-one is. Technological advances freed up labour to do more productive things, and we are all (now) better off for it. The costs of that transition absolutely shouldn’t be disregarded, but the benefits outweigh them.
That is certainly the classical view among economists. Note, though, that I'm talking about the manner in which change is forced through with massive capital spend, and not arguing against progress in general. When automated cotton looms were introduced they really only lowered the skill level; making things better (for non-owners of cotton looms) was pretty much an unexpected side effect.
I am not convinced this is a general rule, and I don't think it would hold in a post-industrial society anyway. We have seen plenty of examples of deskilling in the service sector making everything worse across the board, but persisting anyway.
It is no more a tool for you to use than automated cotton looms were a tool for weavers.
Can you expand on why you think that?
One of the things I've enjoyed about the last three years of AI is how it appears that the benefits to individuals are easier to unlock than the benefits to companies.
There still aren't actually that many success stories of companies that used LLMs to automate large volumes of work... but there are countless stories of individual employees using their own ChatGPT subscription to either openly or secretly help themselves in their roles.
On a small scale I've already seen it happen, with reasonably wealthy friends and relatives using AI instead of hiring artists or writers. The examples I've seen personally haven't been that offensive, but they do illustrate the character of the “democratisation” that is taking place. To wit, instead of spending a long time learning a skill or paying someone who has, you can just make a machine copy someone who has, which is much easier.
I don't think that's great in itself—how is any new creativity going to enter the world?—but it's tangential to the point here. The real issue is the lowering of the skill ceiling and the societal consequences thereof.
The world demands a certain number of people that can, let's say, write a CRUD app. The mechanisms that control this demand are weird and break sometimes, but they do exist and they tend to evolve a (loose) balance where workers have enough power to resist being exploited too badly, even if they are too many for all of them to be rock stars.
If you want to shit on your workforce from a great height, what you really need is to lower the skill ceiling overnight. The vastly expanded pool of qualified workers will be forced to compete with each other for the now very scarce work, which will mean accepting your terms. Other than business owners very few people stand to gain anything at all from this: the jobs will get worse, but they needn't change hands, the threat of replacement is enough.
In a world where anyone can do anything, what you know and how much work you put in mean nothing; your only power comes from how much stuff you own. Or, like, how many AI agent subscriptions you can afford.
You could call this “benefits to individuals” but that'd be missing the forest for the trees.
That's a well constructed argument. I think there's a good chance you're right about this, and coding agents reduce the leverage held by mid-tier programming talent in a way that harms their careers.
I'm hoping (and trying to help encourage) a different outcome. I think these tools have the potential to meaningfully increase the quality, quantity and impact of the work done by all levels of programmer. I would like this to result in a Jevons paradox style dramatic increase in demand for our skills, because if code can be produced in less time a whole bunch of companies that previously would never have commissioned software are now in the market for developer talent.
I have reviewed tens or hundreds of thousands of lines of AI generated code at this point. It is mostly bad or mediocre. Sometimes bad code is better than no code in the short term, but rarely in the long term. I am often asked to approve such patches as-is, because too it is bad or mediocre to be worth the effort of improving, and also because it is AI generated, so nobody wants to spend the effort to improve it. When users experience the downsides of bad or mediocre code, but have no agency to select others because they are not the buyers of the code, the cost of adopting bad code is externalized. It's not a coincidence that Jevon's paradox originated in the context of fossil fuels.
I quite enjoy the Jevons paradox link to fossil fuels too - Bryan Cantrill made that explicit in the podcast episode we recorded last week.
Hm, maybe I'll give it a listen. Oxide seems like an interesting company, I just wish we as a society could decide to do more and do better with the hardware we've got, rather than focusing on churning out new generation after new generation of hardware to run unoptimized code.
I'm hoping (and trying to help encourage) a different outcome.
This isn't what I've seen from your posts so far, where you have consistently ignored bad outlooks on AI technology and consistently pivoted to what you think might be the "brighter side" — you can't get to the one by ignoring the other.
I would like this to result in a Jevons paradox style dramatic increase in demand for our skills, because if code can be produced in less time a whole bunch of companies that previously would never have commissioned software are now in the market for developer talent.
Contractors are absolutely skilled at doing this already, the problem was always the price of the contractor, not the amount of time that they take. Can you think of examples of "a whole bunch of companies" where "product will take a 1 - 4 months to deploy" is a bottleneck, versus "a whole bunch of companies that cannot afford to hire a contractor to properly build and test it reliably"? The mythical "Mom and Pop store" (that in the 90s would have been able to use Personal Database Apps — but capitalism destroyed them) that might be one of the classes of cases that you are thinking of, can only afford to hire a contractor for their in-house store software if the price ceiling is lowered, which backs up edk-'s argument here. And then the question remains — what do they do about long term maintenance?
In the UK, we have a problem where government software contractors are only hired for short periods, which means the information on a given council's website is only reliable for the first 6 months, after that point nobody bothers to update the content because "we have a website now", and after 3 - 7 years they have to hire another contractor, who insists on rebuilding from scratch, because the old solution is so out of date and shoddily written that it's easier to rip the important info out and start from scratch.
That's how things worked, for decades, before AI, think about how AI might make that even worse, as code becomes even more unmaintainable, the price of a programmer falls to the ground, and anyone with an LLM can call themselves a programmer. Now think about said systems handling National Insurance Numbers, patient data, etc. and it's an absolute horror story.
Just look at what a mismanaged or incorrectly written company software can do — in the UK, a badly written computer software for accounting caused thirteen suicides because bugs in the program were thought to be human error, and people lost their respect, their jobs, and their livelihoods. This is what happens when software for more than "toy" applications break down — can we be sure that the next LLM-written accounting software will not cause this?
https://en.wikipedia.org/wiki/British_Post_Office_scandal
https://www.theguardian.com/uk-news/2025/jul/08/post-office-scandal-inquiry-horizon-it-scandal
This isn't what I've seen from your posts so far, where you have consistently ignored bad outlooks on AI technology
252 posts tagged ai-ethics. I really don't think it's fair to say that I've ignored the bad outlooks!
I do however like to push the bright side because I think that a lot of software engineers are terrified of the impact that AI will have on their lives and careers, and I genuinely do believe that those fears may turn out to be unfounded... especially if I can help them learn how to use this stuff in productive and effective ways.
Contractors are absolutely skilled at doing this already, the problem was always the price of the contractor, not the amount of time that they take.
I just can't agree with that. I've seen how long it takes to write software! It's horrifyingly expensive - try getting a single page on a WordPress blog rebuilt for less than several hundred dollars. Now multiply that by every other kind of software project, most of which are far more involved than a single page update.
The reason software projects are expensive is the time it takes to build them. I've spent my entire career trying to find ways to reduce that time and hence that cost - that's what my open source projects are for. I see generative AI as an extremely convincing lever for this.
Your UK council website example is the exact kind of thing I am hoping LLMs can help solve! It shouldn't require a team of developers on a contract to update a webpage. That needs to be possible for in-house teams, armed with prompt-driven systems that will help them achieve their goals without turning to experts for low-risk tasks.
Do you really see no possible future in which tools built on top of LLMs help improve the council website situation?
252 posts tagged ai-ethics. I really don't think it's fair to say that I've ignored the bad outlooks!
I do however like to push the bright side because I think that a lot of software engineers are terrified of the impact that AI will have on their lives and careers, and I genuinely do believe that those fears may turn out to be unfounded... especially if I can help them learn how to use this stuff in productive and effective ways.
But when it comes down to say, the impact on the arts, or the impact on the use of copyrighted content, you have fallen silent. I noticed you fell silent in the debate about how AI code was trained and licensed when it came to the case of "license washing", which seems hugely important to ethical use. I am not particularly happy that my own code was scraped off Github, drained of the license, and then used to train LLMs, personally. And those datasets were collated before anyone had a reasonable chance of reacting to them. See, the rather quaintly named "Software Heritage", which has been talked about in the past on fedi and on lobste.rs for their explicit scraping of Github repositories to (they hide this quite well) create datasets for LLMs.
It might be true that there isn't enough training data if companies started getting picky about licenses — sure, but to that I would respond the same as if a company cannot afford to function without zero-hour contracts: if it cannot be done ethically, it shouldn't be done.
I just can't agree with that. I've seen how long it takes to write software! It's horrifyingly expensive - try getting a single page on a WordPress blog rebuilt for less than several hundred dollars. Now multiply that by every other kind of software project, most of which are far more involved than a single page update.
Right, but you're ignoring the baseline needs of the human here.
A human needs to eat, a human needs shelter. In this capitalist hell we need to use money to have access to food, shelter, water, clothing, etc. So the ultimate end-game here is taking a single contracting role where someone delivers a product, and, you say:
reduce that time and hence that cost
So, as a baseline, you're admitting here that you want to reduce the cost of this work. This means the human now needs to take on many more jobs to be able to pay rent. Said jobs are also going to be flooded with people using LLMs, and this will make it harder for businesses to distinguish people who are skilled, and people who don't give a shit and want an easy paycheck — and no matter what happens, if the domain changes, if there are problems — they are going to expect even less time to overcome those to deliver a solid, well-tested product.
An easy counter here is that common knowledge dictates you're not paying for the time that someone takes to complete the job, you're paying a high price for the experience of the person. A plumber can walk in, spend 20 minutes doing something, but they will still charge their flat fee because you got their experience of fixing that in 20 minutes. That's great, but now contractors might not have any knowledge at all, and there's no way to screen contractors that know their shit from contractors who woke up 2 days ago and went "I want to deliver software", and have no understanding of the extraneous effects of the systems they build, the long-term maintainability, or anything else that an expert might know about. Software contractors will soon get as bad a name as most building contractors, for no real benefit, because you cannot trust that a given contractor has any experience whatsoever.
Users are also going to be at the whim of whoever was hired and what sort of job that they did, which has always been the case, but now everyone is being sold the idea that they can just pick up a computer and type human language text and expect a reasonable output. Sure, they won't understand the domain, they won't understand what the problems are, but they can trust LLMs for that, right?!
I don't see how this makes things easier for the human to eat, pay rent, or stay clothed — they are expected to do more work in less time, they're expecting to be paid less because they're taking less time, so they have to find more work. I don't see how this makes it easier for businesses to navigate the market of contractors — now any contractor could be lying. It seems to make things worse for both sides for no real gain. The end state of all of this is that everyone is worse off.
Businesses are going to slip up in terms of who they hire, users are going to suffer. This is going to be absolute hell in the case of anything remotely important. We've gone from "Sony stored passwords and bank details in plain text" and "crunched programmers did such a bad job that people killed themselves and it went to court" to, whatever the hell all of this is.
Your UK council website example is the exact kind of thing I am hoping LLMs can help solve! It shouldn't require a team of developers on a contract to update a webpage. That needs to be possible for in-house teams, armed with prompt-driven systems that will help them achieve their goals without turning to experts for low-risk tasks.
Do you really see no possible future in which tools built on top of LLMs help improve the council website situation?
No, because to think this is a misunderstanding of the situation. The information is critical to the community — these websites are how people who are homeless navigate and access support systems (I have been homeless and had to use websites for this purpose, they uniformly suck), this is how people with illnesses are able to get support for those illnesses, how people in legal trouble are able to get aid.
So the best case here is that a contractor rolls up and delivers the website in an hour. The council pays them less, so they are less able to eat, drink, pay rent, and clothe themselves — but they are still able to. The council is happy with this website, everyone else is happy. In 2 years (this is how slow shit is in bureaucracies, I'm not sure if you've done any non-profit work but "website" is the lowest of the low on the list of shit they have to do), they have to get around to hiring another person. Nobody updates the website regularly, because the staff member who had that role got shuffled, and nobody else has time in their schedule to do it. The outcome is that after 2+ years, the information is out of date. This is essentially the same situation, but with shorter time-frames. A council isn't going to hire someone to repeatedly come back to maintain the website no matter how low-cost that is, because they have a thousand other items fighting for the budget.
Now let's talk about a worst case scenario. A contractor rolls up, and delivers the website in an hour. Contracting work has been impossible to get lately because the market is flooded, and this deal doesn't pay enough for them to do anything, so they are late for their second or third job (like everyone else in the job market outside of software right now, they have to take on >1 job to stay fed, clothed, and housed). The council worker assigned to hire them was tired and clicked on the first person they saw, because the website isn't a critical issue and they have to do other things, so this person is new to the work. During the making of the website, they were given a list of contact numbers from excel to put on the website; they are new at prompting, so the LLM hallucinated some details of that. Now multiple people in the local area became homeless because they weren't able to find the right number for the support service that promises to help them with housing. Several other numbers are broken too, but those don't matter too much. Nobody updates the website regularly, because the staff member who had that role got shuffled, and nobody else has time in their schedule to do it. This is a much worse situation! There are more things that can go wrong, and again — consider this in the context of, say, patient data, legal paperwork, banking information. All of these fun things are getting touched by LLMs too, either in the form of systems, or in the form of people asking LLMs for advice where they should be asking experts — this has literally got people killed.
An even worse scenario? Literally just the post office scandal all over again. Except, instead of human hands making bad work under crunch, there are less hands (after all, you don't need to hire more programmers when one will do now), they are under a worse crunch than before — they are expected to deliver more in less time, and they have higher stress than before because they cannot keep themselves clothed or housed due to lack of work.
So... no. I really can't see a future here where software contracting does not end up as a binfire. Either things are marginally better, or things are much much worse, and the much-much worse is horrible enough, and likely enough, that it is already conceptually dangerous.
But when it comes down to say, the impact on the arts, or the impact on the use of copyrighted content, you have fallen silent.
That's because I rarely have anything interesting and new to say about those topics. When I do I'll write about it.
I also don't talk about generative image stuff very often because it feels different to me from LLMs: I can think of way more positive applications for LLMs than I can for generative images and video, so I don't invest my time in deeply researching those or helping people learn to use them better.
I noticed you fell silent in the debate about how AI code was trained and licensed when it came to the case of "license washing", which seems hugely important to ethical use.
I haven't blogged about it yet but I've been having a bunch of online conversations recently about license washing. I don't think many people have caught on to how easy it is to take a piece of eg GPL software, have a model reverse engineer that into a spec, then have another model turn that spec into code.
It's the LLM equivalent of the thing in the 80s where Compaq reverse engineered the IBM BIOS by having one team reverse engineer it and write a spec and another team "clean room" implement that spec, which did indeed get around copyright for them.
This is a good example of a small ethical dilemma: if I write about this, more people will know how to do it. I'm not sure warning people that it's possible will have any positive effect in helping them take steps to protect their work. Should I then sit on that information?
Nobody updates the website regularly, because the staff member who had that role got shuffled, and nobody else has time in their schedule to do it. The outcome is that after 2+ years, the information is out of date.
It's pretty clear we won't reach consensus on this one, but this is a great example of the kind of problem I would like LLMs to solve. "Nobody else has time in their schedule" is exactly why I'm so passionate about helping more people learn to use the technology that I've found to greatly increase the amount I can get done with the limited time I have on my own schedule!
I also think we should do everything we can to help people understand how to use these tools responsibly, which includes making sure they understand the risk of hallucinations when it comes to things like phone numbers - and how to mitigate those risks.
"Nobody else has time in their schedule" is exactly why I'm so passionate about helping more people learn to use the technology that I've found to greatly increase the amount I can get done with the limited time I have on my own schedule!
I fear you're still not understanding, and I have to ask — where have you worked? Have you worked "normal jobs" in the modern day, or anything in gamedev where eternal crunch might be a thing?
I don't say this to be rude, but just because that can explain the delta of experience. As IIRC other upthread have mentioned — when you have free time, that doesn't mean less work, it means the boss gives you even more work. Software programming up until recently (again, outside of game development), has been very privileged in terms of the amount of "slack time", or whatever you want to call it, that workers are allowed. I have friends who have worked at places where their toilet breaks are timed, and the time spent browsing the internet or looking at their phone is timed. However many years ago, it was reported that Amazon drivers have to piss into bottles to make their routes, and people broadly just moved on.
The job market today, and the way that employers feel comfortable treating their employees, is different to ten years ago, it is much worse than ten years ago. While we can say this is an externality, file it under "well we just need to push for political reform" (as if it is ever that easy — the 40 hour week was earned by the blood and sweat of workers eating water-thin soups for months on end, and being hung or deported for protesting), it's a very salient point as to how this software gets applied right now in the present day. We cannot depend on political reforms to remove problems with the technology we are using, especially when the state of politics looks as close to fascism as it does now.
It's pretty clear we won't reach consensus on this one
I also think this is the case. You seem to be putting your excitement of LLMs at the forefront, and while you agreed with the posts about LLMs and other forms of generative AI being derivative of fascism, I get the sense that you've handwaved it mentally by pushing the political aspects into externalities.
This is real, this is happening right now, and people are already being affected by it.
Other than business owners very few people stand to gain anything at all from this
Everyone is a business owner: each of us is the general manager of Myself, Sole Proprietor. We all hire and fire others to do work for us every day, when we participate in the economy.
Thus we all stand to gain from increases in productivity. For example, the folk you mentioned who can use AI instead of hiring artists or writers: now more people can beautify their homes and lives than previously. That’s a good thing!
I love to knit. It’s an enjoyable activity. The invention of knitting machines definitely destroyed almost the entirety of the knitting industry, but the result is more and cheaper knitwear for people — and less drudgery for industrial knitters. And I can still knit by hand for myself and those I know, for my own joy.
In a world where anyone can do anything, what you know and how much work you put in mean nothing; your only power comes from how much stuff you own.
I don’t believe that follows. In a world where anyone can do anything, you still need the talent to actually do the right things.
I assume they’re referring to the devaluing of labour, making developers’ employment more precarious thanks to the threat of replacement with fewer “AI”-leveraging, lower-skilled workers.
I presume the 'locally optimal' left-coded position is to use the additional productivity you get from the usage of AI to spend less time working and more time in political organisation and logistics, which is where most grassroots organisations flounder and perish.
https://danluu.com/productivity-velocity/ might be of interest.
Building mass support is more about meeting people where they're at. Imo the biggest struggle right now for grassroots orgs is their reluctance to divorce themselves from (neo)liberalism, people are understandably dissatisfied with the center left and more of it doesn't help.
AI is really tangential to the issue of political and economic power. Like any tool, it can be used for many different purposes. The question is who holds it and how their power is justified.
Having more time to organize would be lovely, but being more productive doesn't actually cut my hours 🙃
I think the key is, just like with inevitable things like drug use, the cat is out of the bag here. People are going to use AI no matter what now, so it’s up to us to form the ethics and regulations around that.
Simply saying “just say no” just won’t work.
People are going to plagiarize the works of others no matter what now, so it’s up to us to form the ethics and regulations around that.
Simply saying “just say no” just won’t work is exactly what the established ethics and regulations are.
Yep exactly.
The question about lying about the provenance of a text is not whether people lying about it sometimes get into the government (they do), it is whether a govermnent minister can be forced to resign over it (in civilised places yes).
Whatever you believe about what the Right Thing should be, you can't control it by refusing what is happening right now.
I disagree with this. I think we can influence the future by refusing to use or promote AI tools. If people don't use these tools, chances are higher that the bubble bursts and OpenAI and Anthropic and its ilk go bankrupt. Conversely, if we experiment with it and use AI (or we promote it and thereby cause other people to use AI), we keep funneling money into these terrible businesses and help them stay afloat. Right now, talking is a very important way to influence the future and to actually impact companies that need to go bankrupt.
Any yes, once the bubble has burst (and we don't have to deal with AI being pushed into every product, or with religious zealots praying for an AGI god that will "fix the world"), then we can have a realistic look at where AI is useful and where it isn't. I think right now any evaluation is tainted a) by the hype people who e.g. have a personal stake in AI companies, and b) by the near-unlimited AI funding that distorts any cost-benefit analysis of LLMs. I wonder which AI-based tools will be useful once AI is not "cool" any more (or is even shunned because the AI bubble bursting broke some pension funds) and once the unlimited AI funding has stopped and you have to fully pay for the computing time for training and inference. Might make for some eye-opening re-evaluations. The sooner we get there, the better; and stopping to promote AI is one small contribution to get there.
If people don't use these tools, chances are higher that the bubble bursts and OpenAI and Anthropic and its ilk go bankrupt.
I would wager that the number of people willing to oppose AI to the extent of boycotting is very low. I feel like I have some idea of the problems and risks posed and I proceed to use it despite that -- despite the risks to our ecology, economy, world order. I feel like it's similar to my decision to still consume beef and drive an ICE car. It's too indirect of an impact for me to focus on.
Because of this thought process (that I believe is widespread, not limited to a few like me), I think boycotts will not be effective. And therefore I agree that "you can't control it by refusing what is happening right now."
I feel like I’m taking crazy pills at the moment, I like and use AI but I use it to augment me and I still try to understand everything it produces before I’d integrate it. I still see lots of hmm, not good enough. Even with opus 4.5 in cursor. How can it be that so many technical and curious people are just accepting everything it produces without even opening an IDE?
How can it be that so many technical and curious people are just accepting everything it produces without even opening an IDE?
I don't know how much this is happening. Everyone I know using AI and who I would consider to match the description of "technical and curious people" are reviewing the output of any AI coding tool. One confusing factor can be the technical and curious people who are selling something (whether it be selling actual products or selling their social presence). Those people are especially public and have motives other than code quality.
In just the last four weeks I've seen several people I trust say that they're relaxing their "review every line it writes" policies thanks to the quality of results they get from the latest models (mostly Claude Code and Opus 4.5).
I've been experimenting with that approach a little myself in some low-risk side projects. It feels SO WRONG committing code I haven't fully reviewed, but so far it's holding up way better than I thought it would.
I'm really struggling to get my head around this, I guess it works for low-risk side projects but having spent so much of my life discussing code, approaches, arguing (sometimes quite passionately on PR comments!). I just don't really understand where this all goes. It's very hollowing to think so much effort into a craft no longer matters.
I've seen these arguments before, back in the late 80s/early 90s, when high level languages were taking over from assembly for major applications. The pro-assembly side was aghast at the horrible code generation and bloat compilers were generating, while the pro-compilation side was arguing about development ease. It's easy now to see how that ended, and for me personally, it's not a good ending ...
That's probably one of the best comparisons. Terrifying to think that python/c#/java never mind C could soon be seen as a dark art the same as many view assembly today!
Lets talk again when you LGTM'd YOLOT Ship It a change with a serious security issue and put the company, its employees or customers at risk.
It is not like we are catching all of those now with humans writing and reviewing the code.
I still don't buy that philosophy. The code I care about isn't the prompt, it's the code that the prompt generated which has then been confirmed to work.
I want to keep the prompt around for reference, but I don't think the idea of regenerating the code every time the requirements change by tweaking the prompt is a good one. What works is generating code + tests, then later running new prompts which modify those tests and then the code to implement the new change.
You may not buy it, but it's happening anyway.
Even that fascinating proof-of-concept project mostly encourages people to generate the code once, test that it works and then reuse that code in the future.
I think a result both of models getting better and people getting better at extracting results from models. Ever since I moved to use an AI Factory, even not so great models like Big Pickle can produce reasonable outputs.
I suspect you’re right that they have other motives but it is increasing hard to have a good mental shield against it!
One of the frustrating things about being a developer who finds AI tools useful and dedicates serious effort into helping explain how to use them to other people is the widespread assumption of "other motives".
This rarely if ever happened when I wrote about other technical topics.
You are a bit exceptional. Most people who bother to make comments (as opposed to lurking) have skin in the game. Explicit disclosure is rare and people often will construct the most plausible pro- or anti- argument they can, independent of their true motivations for participating in the discussion.
Many of us have learned to identify this behavior. You can train the skill by playing the party game "Secret Hitler". Even with a well-trained eye, though, there are false positives, and you in particular happen to have traits that trigger them.
I would never place you in that camp, I’m mainly referring to a lot of the discourse on the site firmly known as twitter.
Thanks, but believe me a lot of people put me in the same bucket as the LinkedIn/Twitter "influencer" crowd!
If it helps to balance that out your articles have been shared within my work place regularly with comments such as ‘the ever excellent Simon W…’. We’re a mid sized UK Dev consultancy so a technical audience!
Even with opus 4.5 in cursor.
Switch to Claude Code. It's the best available harness right now, and the quality of the harness has a very strong effect on the quality of the results.
I'm accepting things it produces without opening an IDE mainly because I force the agent to do red/green TDD, keep an eye on what it's doing and look over the code in the GitHub PR interface before I land it.
I’ll give that a shot, I’m very wary of the idea of not actively seeing the code though. Maybe I’ll ‘get it’ but I find it very alien!
A trend I've observed in the past few months that's equal parts interesting and upsetting is teams that have a policy of "do NOT look at the code". Instead of reviewing LLM generated code the focus is on proving it works through other means - having the LLMs write automated tests, integration tests, testing plans and setting up staging environments to actively exercise the new code.
I'm not there yet, but the people I've seen do this are credible experienced engineers - and the productivity unlock you get from not reviewing LLM code is enormous. It's a bit like running a large scale development organization where individual teams deliver functionality that they claim work and you use large-scale QA processes to confirm that, not individual code reviews.
Yes, this is how human engineering and product managers manage teams of human developers. One of the hard things about transitioning from developer to manager is that you can no longer see all the code as it goes in; you have to learn to manage a process that gives you confidence the code is good.
The disturbing thing is that plenty of developers don’t want to become managers, but it looks like that is becoming an increasingly nonviable option. On the other hand, eliminating the human aspect makes the team a lot easier to manage!
I know I went into programming because I'm introverted by nature, and working with people tends to drain me. It wouldn't surprise me at all to find the majority of programmers are introverted (to some extent) and managers are extroverted (to some extent).
Well, under our current understanding, if you are a bad manager of abunch of LLMs, you still don't have any reports that can suffer from it, which is a bit of a relief for an unqualified-as-manager person who is does not suffering of others…
Hopefully managing LLMs will stay closer to being technical team lead of a team without strong personalities, which is easier than being fully a manager with responsibility for motivation and some HR-interacting stuff…
Exactly — as a (former) manager, I notice I have to suppress the urge to reassure my LLM when I correct it, or reject its code, or just don’t take its advice. It even makes me uncomfortable for a millisecond when I have what I want and just close the window instead of saying “nice work”. It’s definitely like some engineers I’ve worked with wish things were with humans.
I have been in situations where I kind of wished closer to this would be how the things were done towards me! Especially in complicated situations where my attempts at managing the perception of the scale/type of a problem sometimes triggered attempts to manage my assumed mental breakdown. Although, when unlike an LLM I have responsibility for some things, a brief «OK thanks» at the end does carry information that I can unload the question from the list.
"do NOT look at the code"
I'm old enough to think this sounds like history repeating itself. For many big technology shifts/ideas, you hear people say we can finally stop looking at the code (Hypercard, Java/VM, Rational Rose, etc).
In practice, the people I rate the most as developers with consistent extremely productive output have one thing in common. They know the entire stack from the ground up. But who knows, maybe this time it's different :)
...please tell me they relax that policy for code that's anywhere near security critical? Tests can't prove correctness of code. If you can't read the code, good luck even exercising all the edge cases in tests.
Is the argument that code review done by LLMs is as good as code review done by a human expert?
This question already applies to managers of human teams. Are you doing all the security reviews personally? Is a security review done by you as good as one done by an expert? Are you bringing in a security expert when you feel the need? Do you trust your team to bring in the security expert themselves? Does your compliance team tell you when to bring in the security expert? This happens differently in every team and company.
Perhaps the question is whether an LLM can ever qualify as a security expert, but I think the eventual expertise of the LLM in all areas is the inherent assumption behind this whole line of prediction. If an LLM can be a database expert, or a compiler expert, I don’t see why it can’t be a security expert.
That's the thing - an LLM can't be a database expert or a compiler expert. It can't be an expert of any kind. It doesn't simulate all the necessary mental processes. We haven't figured out how to express them as loss functions. An LLM can output statistically probable strings in a database-using or compiler-invoking context, which are likely to allow a database novice or compiler novice to accomplish tasks that would traditionally have required expertise; it can do the same in a security context, e.g. by helping a security novice figure out the correct arguments to make a security tool do something necessary. And it can allow an expert to accomplish their own tasks faster.
But it is important that we not confuse the artifact for the process that produced the artifact. The LLM still is not an expert. As a manager, you learn to put trust in your reports - trust that they have understood the security implications of the code they are committing, for instance. It can definitely be difficult to learn that, especially for people in an industry so famously oriented toward "Fine, I'll do it myself"ism.
But an LLM cannot take responsibility for things. Nor can it understand the concept of security posture in a way that relates to a code commit. You should not trust it to do things it was never designed to do and cannot reliably do.
all I got from this is that antizrez thinks that the end justifies the means which is a pretty shitty position to take
There were a lot of points made in the article and zero of them were that the ends justify the means
The article never brings up why people are Anti-AI and concludes with "you like buildings things and with AI you can build things more efficiently", I think its pretty fair to say the authors stance is that AI is justified because it can build things "more efficiently" regardless of any type of moral or legal concerns.
I used to try to include a nod to all of the (valid) arguments about the negative impact of AI in everything I wrote about it.
That eventually got quite tiresome - I was mostly saying the same things over and over again - and it did nothing to discourage anti-AI people from calling me a shill who didn't care about the negative impacts anyway.
I mean, regardless of whether you actually include the arguments or not, the conclusion is still "the ends justify the means", isn't it?
if everything is "the ends justify the means", then nothing is "the ends justify the means"
I didn't say that everything is "the ends justify the means". But merely mentioning opposing arguments doesn't actually make a difference to whether or not you're doing "the ends justify the means" or not.
I think one common position, and one that I admit I'm still drawn to even though I've rejected it, is that once you know about the negative impacts, the only valid way to care about them is to refrain from using genAI entirely and to publicly condemn it in no uncertain terms -- that anything short of that is cowardice and/or selfishness.
I call that the vegan position, and I respect it. If people look at the overall picture and decide not to engage on ethical grounds that's intellectually credible to me.
I have relatives who won't fly because of the environmental impact, which is sad for me because I live the other side of the Atlantic from them and I'd love to have them visit!
I call that the vegan position
Please don’t lump anti-AI on ethical grounds positions into the term “vegan.”
We can come up with a new term, that doesn’t further dilute the nature of the veganism.
Perhaps, the “ungen” movement, is against the advancement of generative AI on ethical grounds. I don’t know if someone else has already coined a better term.
The implication of this comment is that you included these "nods" not because you were interested in addressing those arguments, but because you wanted people to stop asking you about it.
I'm a longtime reader of yours, and I personally would be interested in your answers to some of the questions you raised in, for example, "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours", excerpted here:
- Does this library represent a legal violation of copyright of either the Rust library or the Python one?
- Even if this is legal, is it ethical to build a library in this way?
- Does this format of development hurt the open source ecosystem?
- Can I even assert copyright over this, given how much of the work was produced by the LLM?
- Is it responsible to publish software libraries built in this way?
- How much better would this library be if an expert team hand crafted it over the course of several months?
Far be it from me to tell anyone that they're obligated to answer certain questions in their blogs. I end most of my blogs with open questions that I think the posts raises. But it's hard to read your blogs and not draw the conclusion that your feeling on the subject is "whatever the answers to these questions are, they're not going to dissuade me from working this way," which is an attitude that a lot of people are uncomfortable with.
If that's not how you feel, even writing once about how you approach these issues would, I think, be very valuable. It may not dispel the criticism—there are certainly people out there who feel that no usage of LLMs is acceptable—but I think it would address the frustration you feel repeating yourself on this subject in comment threads. At the very least you can just link to it every time—"in the future I'll just link to this when people ask me about it" has inspired most of my writing ;)
I have a blog entry permanently in my drafts that's essentially my answer to "do you think this is all worth it?", it's hard to knock that one into a publishable state because it's pretty high stakes!
I can take a quick stab at those questions I raised though:
Does this library represent a legal violation of copyright of either the Rust library or the Python one?
I decided that the right thing to do here was to keep the open source license and copyright statement from the Python library author and treat what I had built as a derivative work, which is the entire point of open source. I reused their determination that their work was no longer derivative of the Rust, but I have not spent the requisite hours of investigation and soul-searching to decide if I agree with that judgement so that's still an open question. Update: Looks like they decided to credit the Rust library after all, so I've mirrored that determination too.
Even if this is legal, is it ethical to build a library in this way?
After sitting on this for a while I've come down on yes, provided the license is carefully considered. The whole point of open source is to allow further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used, I don't think this is materially different.
Does this format of development hurt the open source ecosystem?
Now this one is complicated!
It definitely hurts some projects because there are open source maintainers out there who say things like "I'm not going to release any open source code any more because I don't want it used for training" - I expect some of those would be equally angered by LLM-driven derived works as well.
I don't know how serious this problem is - I've seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?
If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn't have to spare?
I'll be brutally honest about that question: I think that if "they might train on my code / build a derived version with an LLM" is enough to drive you away from open source, your open source values are distinct enough from mine that I'm not ready to personally invest significantly in keeping you. I'll put that effort into welcoming the newcomers instead!
The much bigger concern for me is the impact of generative AI on demand for open source. The Tailwind story is a recent visible example of this - while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I'm suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.
I've found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.
I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that "harmful to open source"? It may well be.
Can I even assert copyright over this, given how much of the work was produced by the LLM?
I'm not a lawyer so I don't feel credible to comment on this one. My loose hunch is that I'm still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.
Is it responsible to publish software libraries built in this way?
I've come down on "yes" here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.
What's important here is making it very clear to potential users what they should expect from that software. I've started publishing my AI-generated and not 100% reviewed libraries as alphas, which I'm tentatively thinking of as "alpha slop". I'll take the alpha label off once I've used them in production to the point that I'm willing to stake my reputation on them being decent implementations, and I'll slap a 1.0 version number when I'm confident that they are a solid bet for other people to depend on. I think that's the responsible way to handle this.
How much better would this library be if an expert team hand crafted it over the course of several months?
That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a very good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren't notably better than the one written by the coding agent.
Update: turned this into a blog post.
I appreciate not just the long reply, but your being willing to put your name and blog behind it as well. You are welcome to replace my handle with my name ("Alexander Petros") in that post, if you want—it's certainly fair.
I have a blog entry permanently in my drafts that's essentially my answer to "do you think this is all worth it?", it's hard to knock that one into a publishable state because it's pretty high stakes!
Ain't that the truth!
The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.
I find it weird to treat one as a replacement for the other. Anything remotely complicated is going to have tricky corners that a solid library will cover but quickly generated code might not. The size of the test suite means nothing it it doesn’t test the right thing which would seem hard to judge if you’re not familiar.
I think the HTML5 example is much better because there’s an existing test suite to compare against.
That's the thing though: a couple of months ago I would see parsing cron expressions as something which I would of course use a library for even though I could have an LLM write me a version, given enough guidance. I'd be arguing for skipping leftpad instead.
Opus 4.5 is good enough that I don't feel that need any more. Parsing cron expressions has crossed the line for me to "prompt, glance at the tests, move on" territory.
Of course that's based on a vast amount of accumulated experience of both cron expressions, writing custom parsers and the capabilities of frontier LLMs. I expect many programmers would not make the same decision that I did, but that's why it's an interesting example here.
I think for something well-specified like an HTML parser the work doesn't fall under copyright at all, and that's thanks to the least understood prong of copyright law, convergence. It's why math equations like e = mc² can't be copyrighted: the universe determined what they say. If the outcome is not the product of creativity then it can't be a protected creative work. I think having to pass 9200 predetermined tests combined with the constraints fixed by the predetermined implementation language means there isn't really much if any copyrightable creativity there at all, even from the perspective of the courts.
BUT. You contradict yourself. On one had you say OSS will benefit from the influx of new coders, and on the other hand it will suffer because those coders (all of them, but especially the new ones I'd guess) will keep their heads down and not care to engage with the messiness of open human collaboration when they could just keep paying the industry's rent-seekers to help them launder copyrighted code from each other.
That's not meant to be me contradicting myself so much as me expressing that there are two valid arguments here and I don't know which one will prove correct over time.
I think for something well-specified like an HTML parser the work doesn't fall under copyright at all, and that's thanks to the least understood prong of copyright law, convergence.
The model presumably has a lot of HTML parsers in its training set, and generative AI is known to output (more or less modified) training data.
Why wouldn't it be a derived work of these parsers?
It's not about whether copyright was claimed over the work, it's about how much of the work was creative enough to be subject to the rules of copyright. If 90% of the code is just written the way it had to be to parse HTML in Python, the remaining 10% might be fair use...
Basically my understanding is that if we might both write (more or less) exactly the same code because we started from the same assumptions and worked towards the same goal, then the material cannot possibly be the subject of a valid copyright claim.
I would not apply the interpretation to docs, which are a far more open-ended creative endeavor. Also if the specific compliance tests were written by hand there's more creativity in those too that might be protected. But the concept of a compliant Python HTML parser is one which, I would think, converges.
If 90% of the code is just written the way it had to be to parse HTML in Python, the remaining 10% might be fair use...
If 90% of Java code is boilerplate, does that mean using code from projects written in Java is fair use?
I could maybe understand the argument that copying a single function could fall under fair use, but that's ridiculous.
The argument about valid copyright claims also feels... weird. Let's say I took an existing HTML parser, copied its parser arguing that there's pretty much no creativity in converting a formal grammar to recursive descent, and adapted it to the API I want it to have, without crediting the original authors. To me that still seems obviously wrong, and, while IANAL, I'd be surprised to see it fly in court?
So few people know what the law says that it's genuinely astonishing. Convergence is not at all the same thing as fair use. Fair use says what you can do with art that is copyrighted. Convergence determines what is art and thus what can be the subject of copyright.
Again this is why math equations don't have a legal owner like books do, only a discoverer.
I think that just "including a nod" to the negative aspects of AI and forgetting about them in the rest of the text is not really considering them. It indeed shows that you might have them in mind but that they don't matter in your calculation of whether you want to do this thing or not.
For me, an anthor who would be really honest about their consideration of the negative aspects would explain their internal calculation of why they decided that it is still worth it. They would spell out why fixing this bug / writing this throwaway project / having fun was worth releasing X kg of CO2 in the air, contributing to precarious jobs in low-income countries and justifying the massive copyright theft.
If there is no ratio of usefulness/harm below which you would say "no, in this instance the harm outweighs the usefulness", then you are indeed not considering the harm, whether you mention it or not.
Should authors of cooking blogs include a note about how they justify not being a vegan in every recipe they publish that includes meat?
If that's not the same as writing about using generative AI for code without justifying the value produced against the negative effects of the AI industry at large, what's the difference?
In most human cultures, meat consumption has been normalized for millennia. In these contexts, veganism is a choice, and not an easy one, but perhaps a choice more people should consider thoughtfully.
Imagine, if you will, a world where everyone was vegan until a few years ago, when a handful of the richest people on earth invented the consumption of animal products, and began using all their wealth and influence to convince the populace to eat meat. Grocery store bread- and even oreos- began to incorporate heavily subsidized bonemeal and beef tallow. Investors scrambled to set up battery-farms worldwide where billions of chickens could lead exceedingly short and miserable lives converting grain into breastmeat. Climatologists fret over methane emissions from livestock, but what are a few more degrees of global warming compared to the delicious taste of bacon? You get the idea.
Within this context, you excitedly run a cooking blog where every recipe is made from meat- riced beef, bacon croissants, an exciting new flour substitute derived from dog tendons- often featuring recipes handed to you personally by the meat vendors to highlight! How delightful to be so recognized!
It is never a question, upon your blog, of whether to make the next recipe from meat, but only how. At times you consider including a nod to the pained whimpers of the animals you slaughter on camera, or the looming question of what might happen when the meat industry entirely chokes out human-edible vegetable production and begins to charge the public the true price of their products, but these dark thoughts are quickly silenced by how delicious your meat tastes, and with time you think of them less and less often.
A meat propagandist, you? Perish the thought! It's not like you directly profit from the popularization of meat. You're just interested in making sure everyone uses meat products effectively. Anyway, here's a delicious new way to serve live-harvested kitten liver...
I think it is actually a great idea! Remind people that the meat in this recipe come from a living animal, give an estimation of how much CO2/water was produced/used to raise it and give the quantity of meat needed for the recipe, provide alternatives with cheaper cuts or different animals, explain what to adapt/replace to make the recipe vegetarian or vegan. All this to help people make their own choices and compromises while making (or not making) the recipe. I would love such a resource.
I do reflect on whether the benefits outweigh the harms every time I buy and eat meat. This helped me drastically cut my meat consumption, and also be more mindful about where and in what conditions the animals were raised.
Hm. I don't think that analogy works. Vegans think that exploiting animals is inherently bad, no matter what, while non-vegans, well, don't. You said you used to include what you consider to be valid arguments in your articles, but I don't think you'd consider "the use of any LLM is inherently immoral" to be a valid argument? It's not even really an argument.
@Armavica's version is much better. Even as a non-vegan, I can be pursued to eat less meat because, for example, it being bad for the environment. I don't think meat eating is inherently immoral, but I do care about the environment! And so, I am in fact trying to reduce my meat intake.
I don't think you need to include a big justification in every single blog post, but also, an actual weighing of costs to society vs benefits to you would be interesting to see, because I've never actually seen one. It's always either "it's bad for society, therefore I won't use it" (frankly, I'm here) or "it's good for me, therefore I will use it". Often, both of these camps will shout out what they consider to be valid arguments from the other side, but I've ever actually seen a real weighing of the arguments.
I would be interested in a companion piece from the author about the ethics and morality aspect.
The author does posit that people are anti-AI because they don’t believe the quality of output is good. And he’s not wrong, you can find this thinking all over the place. But he absolutely skirts a big component of concern over AI, which is the ethics and morality of it.
But he absolutely skirts a big component of concern over AI, which is the ethics and morality of it.
I don't think he does. From the article:
Moreover, I don't want AI to economically succeed, I don't care if the current economic system is subverted (I could be very happy, honestly, if it goes in the direction of a massive redistribution of wealth).
[...]
But I'm worried for the folks that will get fired. It is not clear what the dynamic at play will be: will companies try to have more people, and to build more? Or will they try to cut salary costs, having fewer programmers that are better at prompting? And, there are other sectors where humans will become completely replaceable, I fear.
[...]
There is a sufficient democratization of AI, so far, even if imperfect. But: it is absolutely not obvious that it will be like that forever. I'm scared about the centralization.
No, that's mostly economic problems. Notably, they're not inherent to AI as a technology: "a massive redistribution of wealth" would solve some of them, and some of them aren't even an issue yet - "There is a sufficient democratization of AI, so far, even if imperfect".
I think this is removed from the ethical issues people have with AI - e.g. plagiarism - which you're not going to solve with a different economic system or whatnot.
How reuse and adaptation of creative work is treated is absolutely an economic question to a very large degree. And economic struggle definitely incentivises hiding the traces…
May I kindly, sincerely and in good faith ask that you - we - all of us - not reduce thoughtful, long-form posts like this into a single (in this case, capitalization and punctuation free) sentence. It's an asymmetric amplification of a position that you may agree with, but Salvatore or someone else puts carefully effort into a post, responding like this doesn't advance any line of reasoning or change someone's mind.
My comment was the takeaway I got from the post. The actual "Anti-AI Hype" or any complaints that people have were never brought up and things like a possible stock market crash are brushed aside as if its seemingly irrelevant.
The last sentence sums up the feeling that I've gotten over the entire post, for people who are more interested in releasing a product than writing software AI is an incredibly helpful tool that can fill any gaps a a regular programmer may have.
Salvatore writes "As a programmer, I want to write more open source than ever, now" but this just sounds like wanting to "ship" a project to democratize code without actually caring about the open source movement or its complex position within capitalism while utilizing the help of AI that steals from those projects that do care about the movement.
@antirez: If you're active here I'd be really interested to learn more about your move from the workflow you described about 6 months ago to the Claude Code focused approach described in this post, where you let the agentic loop do its work and review after the fact. Is this because of the models improving? Tools like Claude Code improving? You becoming more comfortable with being less in the loop? All of the above?
Quoting the old advice on flow for others following along:
Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.
Hi! As, for the serious work, I used the workflow of being in the middle between the LLM and the editing / programming process, at the same time I constantly used Claude Code to generate throw away projects, when what ended into the code really didn't matter much. I was just using such programs a few times and throw it away. Suddenly after Opus 4.5 it jumped forward so strongly that I could not longer need to do any "low level" steering, most of the times, but only guide it for the higher level ideas/features. When this happened, I had to test it with system software, getting results that no longer justify the process I was using. Now to write a good specification document, and to review just the subtasks when each is completed, just following a bit what the agent is doing in the middle, produces great results. If for some task I don't like what it is doing, I just write a new document, or show it my thoughts, and it continues, and so forth. To correct the trajectory at every step is no longer needed, and this fact completely changes the dynamics. Now the act of "writing" the code itself became for most tasks automatic.
Booo. As one of the holdouts, it's just filled with the same hand wringing and amoral arguments like, "I'm getting out the way of the AI hype train, and so should you".
I am not even convinced the the author believes that massive AI adoption would relieve the anguish of the human condition. No argument is presented for such, and it's kind of an absurd statement given what we've seen of how LLMs shape a society, cementing injustice and amplifying the concentration of wealth while turning the sharpest minds to mush, not because they aren't still attached to smart people but because those people have had their philosophy and system of values shattered so that they now see no point in anything other than going with the flow
The best I can say, as one who has given up this fight against AI and decided that I will use it to some degree, is that we can't all fight every battle. But that doesn't mean that I, at least, don't care about any moral or social justice issues. I, for one, choose to focus on accessibility, and I figure that if using LLMs helps me do that work better, then that's more of a net good than it would be for me to individually fight this tide.
Yeah, I totally respect that. I've chosen this as my hill to die on, well, because I can, and because I care deeply about it. I obviously am not going to stop the adoption of AI, but I hope I can be an example to people who want to know that AI isn't the only way -- that the human capacity for excellence dwarfs AI capacity for excellence because, well, we choose hills to die on. We struggle, we hurt. We believe!
I agree that the strand of thinking that's very prevalent on say Mastodon that AI is entirely useless and damaging is both 1. wishful thinking and 2. technically illiterate.
A friend of mine read "The AI Con" and describes exactly the same kind of thinking there: "But you can't have your cake and eat it too: You can't say that it is shitty technology that does not work, and at the same time write about how disruptive it will be for employment. I can't take a book seriously that at no point acknowledges the power of the technology in at least some domains. Either you haven't used it or you are purposefully misrepresenting the case."
"But you can't have your cake and eat it too: You can't say that it is shitty technology that does not work, and at the same time write about how disruptive it will be for employment."
I haven't read The AI Con, but my first thought is "the [job] market can remain irrational longer than you [the worker] can remain solvent".
That's true. The AI does not need to be good enough to do your job, it only needs to be good enough to convince your boss that it can. And convincing people of stuff (correctly or wrongly) is something we know beyond any shadow of a doubt LLMs are good at.
Exactly. AI's primary achievement is not technical, but political.
For example, it is not technically able to process healthcare claims with increased accuracy so much as it's able to convince people that it can process healthcare claims accurately. That's why there is so much investment in introducing AI to heavily regulated sectors— approving healthcare claims, approving loans or making up credit scores, military targeting systems, etc.
It doesn't even need to convince people, it is enough if they can pretend to be convinced that Ai's work is good enough for an externally imposed requirement to have the processing done.
it is not technically able to process healthcare with increased accuracy so much as it's able to convince people that it can process healthcare claims accurately
Are you talking about LLMs specifically, or deep learning in general? For the part I know a little about, medical imaging processing (yay! my time to shine on lobste.rs), at the very least for some tasks like segmentation, "AI" is both faster and more accurate than the average radiologist, by about any metric you can come up with. We're not talking LLMs here though, but boring convolutional "U" nets.
Sort of? But if your jobs was essential then the company will have problems pretty fast without you and either your job comes back or a competitor that hires people shows up.
If the company can convince the purchase decision makers (managers a few steps above the actual work) that the product works, while it doesn't, the feedback loop takes at best years and years, and at worst forever because there is a convenient downturn to explain away the customers' eventual failures.
market
It's a similar fallacy on the left with markets. They'd like "markets" not to be there, but disliking something is not automatically a coherent criticism or a theory of change.
The job market may be irrational but both for moving with the market or against it, the only way is to engage with it as deeply as possible. It's a practice we used to call carpentry, something that you can do with any material, physical or otherwise.
I'll gladly admit that markets exist. I have yet to be told what a market is mathematically, though. I also happen to know that free markets are probably inefficient because likely P ≠ NP; a free market is one where prices can be announced globally. This isn't mere dislike; I have a serious skepticism about the foundations of neoliberal economics and any insistence that market dynamics supplant the inherent value of labor.
There is an implicit assumption here that businesses will not accept worse code and worse engineering practices in exchange for disrupting the power of labor and saving money on payroll. This is obviously false.
AI can be both bad at software engineering and good enough for management.
assumption here that businesses will not accept worse code and worse engineering practices
Oh this is for sure true but this has always been the case.
This directly contradicts your statement that
You can't say that it is shitty technology that does not work, and at the same time write about how disruptive it will be for employment.
It is both shitty, and a new tool for management to use to disrupt labor power.
Is WIPRO bankrupt yet? No? Well, then AI doesn't work.
This is my touchstone. The moment AI actually works for programming, the body shops will go under instantly.
I don't think that's how business works. Big companies don't handle rapid change. They probably have multi-year contracts in place with body shops already, and allocated budget. If those body shops start using LLMs to help with AI many of their customers will be delighted that the money they are already spending appears to be working out, and they can even tick off the "we use AI!" box in their annual goals without doing any extra work.
I read this in the morning and haven't looked at the 51 new comments since then (of 180 total at time of writing), but I just wanted to say that I appreciated the discussion of this article here on Lobsters. I found it compelling and full of thoughtful points, despite the passion and gravity of the topic.
I find my browser extension really helpful with popular posts: https://github.com/timkuijsten/BoundedBikeshed
One thing I observe is that many of the big boosters with "name-brand," highly impactful projects (Simon, Armin, and antirez, with (at least!) Datasette, Flask, and Redis respectively) are doing a form of software development that's pretty rare, something like "open source tool, limited in feature scope, with the codebase primarily developed by one person. The product is run and managed by someone else." These are all amazing projects with fantastic engineering, and I respect them all as engineers, but IMO it's exactly the type of project where LLM coding agents help you most, whereas more common forms of engineering have properties that make it harder to realize those gains.
If your project handles customer data or has a live database, or interacts with dozens of SaaSes, or involves many teams coordinating (where people have vacations or parental leave or get fired), or the project has millions of lines of code, or... in most professional environments I've been a part of, the "wonders" of agentic coding are much harder to draw so much yield from because there's so much more at play than "code got written."
(there was a similar dynamic when geohot tried to be an intern at Elon's Twitter. IIRC he had some major achievements in jailbreaking devices, but he was an intern for 6ish weeks at Elon Twitter and delivered a whole lot of nothing. I suspect it was because developing features against Twitter's codebase is a very different beast than writing software to own an iPhone, which is something you don't do with million-line codebases and hundreds of other engineers or a live database. I remember in the Twitter Space where Elon got humiliated by a former engineer, they kicked that guy out and geohot took the mic again to say "the main problem with Twitter is that you can't develop it locally..."; suggesting that his for his set of skills, he could only really "harvest" them for his very individual-focused software projects)
No shade to the boosters, I just see developers of this kind of project disproportionately deciding this is The Future Of Software, whereas most people in more typical corporate environments report smaller gains. It's like if your job was just to develop web app MVPs, rails generate would seem like a much bigger deal (it saves you 90% of typing and is so much faster!!!) than it is when you're adding features to a mature Rails app.
This is fair.
I'm not working on million lines of closed-source code for other people at the moment, but I have done so in the past. My intuition right now is that if you let me loose on some of those older projects with my new tools I would do wildly productive, useful and high quality (well architected, secure, well explained, etc) work with them.
But I can't know for sure because I'm not working on those kinds of projects right now.
Armin was working at Sentry on those kinds of projects when he first started exploring LLMs, but he's left to do his own thing.
Yeah, similarly I laugh at all the shills from Anthropic talking about how productive Claude Code is at writing Claude Code... It's a simple (but quite buggy) TUI app that makes some API calls with a light sprinkling of concurrency. And if they were really so productive, surely they'd rewrite it into a more appropriate language in an afternoon?
OpenAI did rewrite their Codex CLI tool in Rust a few months ago, presumably using codex to help.
Anthropic are unlikely to switch languages for Claude code given they outright bought the JavaScript runtime (Bun) they were using.
I don't think anti-AI is a "hype", and the counter-arguments from the author do not resonate with me.
Yes, maybe you think that you worked so hard to learn coding, and now machines are doing it for you. But what was the fire inside you, when you coded till night to see your project working? It was building. And now you can build more and better, if you find your way to use AI effectively. The fun is still there, untouched.
The same applies to cars, or even planes. They changed transportation forever, that's a fact. And we couldn't go against innovation either at the time. But we can't deny their negative impact on the environment, or the number of deaths they cause (mostly cars, you get my point).
Does that make anti-car people fools ?
Did people stopped walking or cycling, because "the fun of being transported is still there ?"
How would you define a "hype"?
A trend, stance or idea that makes you standout or feel like you're "cool" in society. In this regard, the anti-AI hype the author mentions would be to stand against AI to not follow the mainstream idea that AI is the future.
You're missing power. There is an enormous, world-economy-jeopardizing amount of money behind GenAI, pushing to normalize its usage, and convincing the public it can solve any problem. There are large short-term financial and professional incentives in favor of accepting and reinforcing this enthusiasm for these commercial products. Hype is a prevailing wind purchased with power and incentive.
"Anti-AI Hype" is a contradiction in terms for anyone who understands how far from level the playing field is.
Yep, this is what I feel like the pro-AI crowd misses. There are hundreds of billions of dollars behind the pro-AI push. Is there any real funding behind the anti-AI or AI-skeptic position? Are there any "hey maybe let's pump the brakes on all this AI stuff" billboards north of SFO?
"hype" doesnt require "power", just social standing/position
e.g. indie games can be "hyped up" without being backed by relatively large resources
"hype" can be achieved through various means, power is just one of them
But what was the fire inside you, when you coded till night to see your project working? It was building.
Not exactly. For me, it was not the building but the solving. I wanted to see it work, which was the final ingredient to knowing the solution was right. Now when I see it work, that's not enough. I want to know it's right. (You would think I'd be into TDD…)
I think our motivations are personal; it's not universally just the resulting behavior that matters. But when the state of programming was more steady, we could all find our way to it. Now that it's changing to this degree, finding satisfaction again may require introspection about what satisfies each of us.
Yes, absolutely, and I think this divide is fundamental to a lot of the differing thoughts in the whole ai in programming debate. I also think the "builder" perspective is overly represented in the dialog, for various reasons, but not least of all because that approach tends to be more prevalent among people who found/own/run companies and projects and who have a bit of a soapbox as a result.
I'd been thinking something along these lines for a while, but not come up with what felt like the right words for the X vs Y one-liner... but I think "builder vs solver" is the closest I've come, gonna steal that.
But the work of a developer was never “just writing code” anyway — beyond gathering requirements and sussing out what the goal really is, “writing code” is a matter of developing an ontology of the problem, forming abstractions, and then reifying them in code. I don’t use LLMs for ethical reasons, but it seems to me that by using these tools to churn out something that “works”, you’re missing out on the important part of the iterative process of creation, and would seem to suggest that “AI” coding is more-or-less an append-only process (e.g. Yegge’s “Gastown” thing).
Has anyone compared the LLM situation in the west to that in China? It would be interesting to read about the attitudes towards LLMs, adoption trends across industries, whether there is a pervading sense of anxiety around imminent job losses and so on.
I found this recent interview quite revealing:
https://m.youtube.com/watch?v=qDNFaAz3_Cw&pp=0gcJCR4Bo7VqN5tD
China experts Selina Xu and Matt Sheehan separate fact from fiction about China's AI development. They explore fundamental questions about how the Chinese government and public approach AI, the most persistent misconceptions in the West, and whether cooperation between rivals is actually possible.
What struck me was their observation that the west is focusing on (and worried about) AGI and very large systems (ie, frontier LLMs), with a quasi-religious viewpoint. Whereas China is diffusing AI into many more parts of everyday life, and intentionally avoiding the AI-or-not-AI dichotomy.
However one might support one approach or the other, I found it helpful to have an alternate frame to think about.
I've seen stuff like this: https://www.theguardian.com/technology/2025/jun/05/english-speaking-countries-more-nervous-about-rise-of-ai-polls-suggest
Reading this made me realise. I would also like to know what Chinese people's thoughts and apprehensions about deepfakes and AI powered social engineering are. In the Anglosphere there have news reports about schoolgoing children being victims to deepfakes. Even more recently there was the Grok undressing travesty. Now I am wondering if these problems exist in China too.
I single out China here because it an extremely large country, the most advanced developed ecomomy with a distinct political system and also a hotbed AI advancement even if it lags behind the American giants.
Given that they already have the infrastructure for supressing suspicious drawings about Winnie the Pooh, they are probably better positioned to wipe out of the most important public conversation venues deepfakes of the kind that are popular to wipe out?
Is there evidence of this Winnie the Pooh censorship? I've only seen it being mentioned second-hand on Reddit and such.
I don't know if it made it into any of the LLMs but it was a huge thing on WeChat etc. Wikipedia has a page about it: https://en.wikipedia.org/wiki/Censorship_of_Winnie-the-Pooh_in_China
Here's a story about DeepSeek's online chat thing filtering it: https://medium.com/humanai/deepseeks-winnie-the-pooh-problem-c017ae50ec2c - though i often find that the DeekSeep API and open weight models are not subject to the same filters as the chat UI.
Deepseek censorship is done on the chat frontend level. Messages redacted only after an offending phrase is generated.
I am still unclear on the Winnie the Pooh censorship. I followed the first three citations there and they are not good quality sources. There is only mention of Pooh being censored on social media. I don't disbelieve that it exists in some capacity but establishing the extent is important. Apart from that second and third articles say that there isn't a wholesale ban on Winnie the Pooh.
I meant only episodes of erasure on the social media level (which is why I mentioned wipiing out of the conversation venues, not blocking generation). I do think that if deepfakes are not spread much, they are perceived as less of a problem. I did not mean and I don't think that China consistently requires models at the weight level to be safe to the degree of not generating questionable images.
Something a lot of people probably have trouble admitting to themselves is the existential question this poses to a career we have sunk a lot of time into (and let's be real, probably an unfortunate amount of our identities are wrapped up in it). And then also how much this existential question is affecting their perception of the quality of AI code.
I don't have a problem at all admitting to myself the existential question this poses to my career. I worry a lot of the pro-AI side doesn't care about this problem at all.
Because they still think they'll be the users of AI, rather than the ones being replaced by it.
I'm especially shocked by how they lose track of the amount that they contribute to the process. They believe that AI is smart enough to magically perform skills that they've spent decades honing, but also that the AI isn't capable of self-prompting "now write the tests and make sure there's no code duplication".
I don't know if AI will end software development as a career, but I want to wake these people up to the fact that if it does, it will also certainly not need them to play the role of a hapless middle manager.
Much like the "change is happening anyway" line, I can picture a way to use coding agents that, as the tech improves, may handle what most people assume is the total scope of my job. (Of course our jobs involve so much more than others know.) But if that kind of transition is going to happen to my work, I would rather be the one to make it happen and in so doing keep as much control as I can.
I don't think there's an existential question. It's more of a nature of the job question. Will coding jobs involve a lot more AI assistance in the future? Maybe. But will they simply go away? Seems unlikely.
Writing code has always been the most persistently finnicky part of software development. I completely trust an AI do decide on architecture, or to write specs; most architectural decisions don't matter as long as the implementation is good. (non-formal) Specifications have a large degree of slack to them.
The only thing about software engineering that can't be automated away by sufficiently advanced AI(*) is the soul-sucking meetings.