LLMs are a failure. A new AI winter is coming
39 points by jkaye
39 points by jkaye
As Anil Dash correctly points out:
Hundreds of millions of users are choosing to go to these websites, of their own volition, and engage with these tools.
LLMs are a failure? This is a new definition of failure I'm unaware of.
Hundreds of millions of users are choosing to use these websites at their current rates, and it's uncertain if those rates will ever lead to these websites turning a profit, which is why even if these products are making money it's entirely possible that they fail. I personally don't think they're a failure, but I think there's perfectly legitimate reasons to doubt the long-term success of LLMs as they are today.
Given current growth rates, it would be "dumb" for Anthropic to focus on profitability instead of growth, but Anthropic absolutely has a path to be profitable, even at current growth rates. I'm sure they could focus on profitability now instead of investment and continue to print money.
I am not an Anthropic employee, fan, or anything. I'm using Anthropic as an example because OpenAI is weird structurally, Google has other business units it can hide things with, etc. Anthropic is nakedly all in on AI, and is doing well.
Anthropic absolutely has a path to be profitable
Do they? Ed Zitron has reported that OpenAI's inference costs are higher than their revenue, which suggests that even if API inference is profitable, the consumer service might not be. Most people seem to expect the good times of getting tons of inference on a $20/mo subscription are going to end sometime, are people willing to pay more? I don't think that's easily assumed.
I know Ed is very biased against AI, but I have no reason to doubt his numbers, given he seems to have gotten them from reliable sources.
I think OpenAI has its own set of problems. Anthropic Is on Track to Turn a Profit Much Faster Than OpenAI (archive.ph link)
Sure, I'm not saying OpenAI and Anthropic are the same, but I have no reason to believe Anthropic is somehow much more efficient than OpenAI, or that they're paying much less to run their inference. If their inference for the subscription model is fundamentally unsustainable, I don't think "hundreds of millions" of users will want to use them at Sonnet's honestly absurd API pricing. We use Sonnet via API for claude code at my workplace; it's just so much more expensive than the subscription.
FWIW your article seems mostly based on internal strategy documents and not about actual, concrete numbers from their spending, so I'm unsure if that contradicts Ed's reporting on both OpenAI's and Anthropic's unsustainable burn rates.
Anthropic is much more efficiently targeting profitable buyer personas than OpenAI is. As a business strategy, they have targeted business use cases much more effectively. OpenAI's user mix is weighted toward much less profitable users (subscription models). Anthropic's is weighted toward much more profitable users (pay by use). This is a deliberate business strategy that either company can choose to do! As you yourself point out - you are aware of an organization (your own!) that has decided, even given other alternatives, to pay Anthropic with their (more expensive) pricing model! I think you have all the evidence you need to change your mind here.
I don't think Ed is biased against AI. I think he's profoundly offended at the poor quality of journalism surrounding the big LLM companies, especially with respect to their finances. His writing has convinced me that there is considerable fraud going on with these companies and their accounting practices, and I think that is more likely to be their ultimate undoing than a technical failing or insufficient runway (although I do think insufficient runway would eventually sink them). OpenAI is more like Theranos than Pets.com or Flooz.com.
Do they? Ed Zitron has reported that OpenAI's inference costs are higher than their revenue, which suggests that > even if API inference is profitable
What does this have to do with Anthropic's ability to make a profit? Do all LLM providers have the same balance sheets and expenses and business relationships and funding situations?
Do we assume if fly.io is struggling as a business, that it must mean render.com is also struggling, because they both build cloud platforms to host apps?
Anthropic is very clearly pursuing different customers and a different strategy than OpenAI. I admit that the 'time to break even' metric for "AI" is basically "who knows?! maybe never?"...but that is not new for technology companies.
Do we assume if fly.io is struggling as a business, that it must mean render.com is also struggling,
If fly.io is struggling as a business because their fundamental technology is too expensive to operate, and render is built on those same fundamental technologies, then I think it's very easy to make that conclusion, yes. Note that I very clearly said OpenAI's inference costs are too high, not that their business model is unsustainable. It's safe to assume that Anthropic doesn't have a fundamentally different model for their inference that is somehow magically cheaper than OpenAI's.
There have been many deep explorations of token costs for open-source LLMs, which are fairly close to the frontier if not there already. SemiAnalysis has excellent reporting here: https://inferencemax.semianalysis.com/
LLM inference is profitable at fairly low prices. At OpenAI and Anthropic's API pricing, they're wildly so. While it's possible that Claude and GPT-5 are for some reason much, much more expensive to run than DeepSeek, it seems pretty unlikely they're 10x more expensive — they're not that much better. On GB200s with multi-token prediction, DeepSeek R1 is break-even on 3-year rentals at $0.15/million tokens. That's somewhere between 10x and 100x cheaper than OpenAI's GPT-5.1 pricing. And although DeepSeek 3.2 just came out and so there's less data on it, it's currently speculated to be ~2x cheaper to run than R1, and benchmarks on par with GPT-5.1, Gemini 3 Pro, and Claude 4.5 Opus.
There's no technical reason for OpenAI or Anthropic to lose money on inference with their current pricing. If they're losing money — and TBH I don't believe Ed has any useful insider information, since he seems generally to ignore facts that contradict his desired position — it's not because there's some fundamental law forcing them to; it's just because they can raise a zillion dollars to cover whatever they need, and are focusing on growth+distribution instead of margins since they can provably fix the margins later.
First things first, the numbers aren't an opinion. The FT, a serious institution, has stood by by Ed's reporting on this matter, and by all accounts the numbers are based on real, internal sources. I see no reason to doubt his numbers regardless of his personal opinions, and you're going to have to do more than "I don't believe Ed" to dismiss them. Of course there's no publicly verifiable data - AI companies aren't very keen on letting everyone know how expensive their tech is.
LLM inference is profitable at fairly low prices. At OpenAI and Anthropic's API pricing, they're wildly so
I'm not denying that, and never did. What I am saying is that they're likely unprofitable at the subscription rates. This is simply proven by the fact that OpenAI is spending more on inference - not training - than they're making in revenue. Nobody wants to pay API pricing for these unless they're using them for business, which moves the market for them from "hundreds of millions of people" to "large businesses with specific needs".
If you consider the FT saying:
In short, though we’ve been unable to verify the data’s accuracy, we’ve been given no reason to doubt it substantially either. Make of that what you will.
To mean that the FT "stood by Ed's reporting" and that "by all accounts the numbers are based on real, internal sources," I think you and I have very different ideas of what that means. They're publicly not staking their reputation on Ed's claims, and are stating they can't verify them.
Regardless, subscription rates are also incredibly profitable at SemiAnalysis's publicly verifiable costs. Claude's $20/month plan gives you approx. 10-40 messages per 5-hour window (with that limit hitting faster the more context you use), and has weekly caps so you can have 8 sessions per week — aka, 1280 short messages per month if you're managing to maximally use your rate limits, or 320/month for long ones. If it costs $0.15/million tokens under the hood, $20/month for that is very unlikely to be at a loss. Even if you use the entire 256k context window (meaning you're at 10 messages/session), you can only burn ~$12 of tokens per month if it costs $0.15/million tokens under the hood.
And by the way — the limits are actually lower for Opus. This is assuming Sonnet usage, which DeepSeek easily surpasses. Sonnet could certainly be significantly cheaper.
And... most subscribers are pretty unlikely to be maxing out rate limits every single day, every single week. But even if they do, Anthropic is profitable on inference!
we’ve been given no reason to doubt it substantially either
I think you and I have very different ideas of what that means
Okay, what does that line mean to you?
you can only burn ~$12 of tokens per month if it costs $0.15/million tokens under the hood.
I don't see any reason to believe Sonnet actually costs 15c/m tokens to run. That would imply Anthropic is charging a 100x markup on the API price, which is laughable if they, as you say, are in the "growth phase" and not focused on profits. So which is it: is Anthropic charging extortionate markups on their API, or is their inference significantly more expensive? Given what we know of OpenAI's inference costs, I see no reason to believe you.
FWIW, the rough estimate I've seen in my own experience of using the Pro plan and from tons of people discussing this on reddit is that you can easily use $5-10 of API pricing per day.
In short, though we’ve been unable to verify the data’s accuracy, we’ve been given no reason to doubt it substantially either. Make of that what you will.
This means that the FT is not claiming Ed's numbers have been verified.
I don't see any reason to believe Sonnet actually costs 15c/m tokens to run.
SemiAnalysis has run verifiable tests on DeepSeek R1, which I've linked and you've ignored, which show it's break-even at $0.15/million tokens. DeepSeek 3.2 is even cheaper, and surpasses Sonnet in basically every benchmark. If Sonnet's architecture is worse and more expensive — well, DeepSeek is open-source, and Anthropic could simply re-train using DeepSeek's architecture + Claude's training data + RL environments, and serve at those rates (or lower ones, since they could actually train a smaller model since Sonnet is dumber than 3.2). This is exactly what Moonshot did with Kimi K2, and what Mistral just did with their release of Mistral Large today.
There's no technical reason for Anthropic to lose money on inference. If they're losing money, it's because they're focusing on growth+distribution rather than on margins, which makes sense for their current stage.
from tons of people discussing this on reddit is that you can easily use $5-10 of API pricing per day
In this very thread someone has told you they ran the analysis of how much they were getting on the $200/month plan, and it amounted to a few dollars per day. Regardless, the Claude rate limits are known publicly, and it's pretty simple math. 1280 short messages, or 320 long ones, per month, for Sonnet, and much less for Opus. Anthropic can easily be profitable on inference, and if it's not, it's simply because it's prioritizing other things and can solve the margins later.
Edit: BTW, this doesn't mean Anthropic can easily be profitable overall. It might not be profitable long-term, ever, because the inference profits may not cover the training costs. But that's a very different argument than the one Ed makes, where he apparently believes that it's impossible to even be profitable on inference. That's provably false, and either he doesn't know enough to know that, or he just ignores data that disagrees with him.
This means that the FT is not claiming Ed's numbers have been verified.
Sure, if you think the FT saying there's no reason to doubt it means you can doubt it, you can do what you wish.
verifiable tests on DeepSeek R1, which I've linked and you've ignored, which show it's break-even at $0.15/million tokens
Yes, that's a different model. Anthropic is not running that model.
Anthropic could simply re-train using DeepSeek's architecture
Sure, and I could simply wave a magic wand.
very thread someone has told you they ran the analysis of how much they were getting on the $200/month plan, and it amounted to a few dollars per day
And I've pointed out my own counterpoints that I've spent >$200 of API pricing a month for the same usage I get on the Pro plan, and reports from tons of other users on reddit back me up on this, so you have one user vs the majority opinion that claude code is simply too expensive on API pricing.
Anthropic can easily be profitable on inference, and if it's not, it's simply because it's prioritizing other things and can solve the margins later.
Of course, and that's why they're charging a 100x markup on their inference! That makes a ton of sense!
It's really important to understand that you should look at the inference costs for older models, and how they change over time. It seems that for models that match the state of the art a year or more ago, those costs have gone down a lot, as new ways to improve the efficiency of the process appear.
However, given the anticipated very high payoffs, all the major companies want to continue spending large amounts of money to get prestige for the best model and potential profits. I'm unsure how good of a strategy that is, because it seems like users are pretty happy to change models all the time.
But if that strategy fails, and investors get antsy, what you'd expect would be consolidation, a drastic slowdown in the release of new models, and continued use of existing models, with less frequent training of new ones, using more efficient versions of older techniques.
There is a world where the market collapses, OpenAI goes bankrupt, and the hype dies. But it's important to realize that is a world in which there still could be plenty of models, both corporate run and open, that remain usable for programming or other tasks.
It is true that at some point, parts of the programming knowledge would become stale, but that doesn't require multiple new models every year.
To put the programming genie back in the bottle, you would have to get everyone who believes LLMs are useful for programming to change their minds.
I believe LLMs are useful for programming, and I'm not changing my mind about that. I just doubt that the current subscription-based model will be successful, and I'm unsure if "hundreds of millions" of people will use LLMs regularly if they're API-priced.
Routing solves a lot of the expense issues. My friends in Europe tell me the "consumer" use of LLMs is a lot of "write me emails" or "suggest an activity or date" and maybe increasingly "search the web" as Yandex/Google/Bing continually deteriorate (LLM use has caught on a lot less around where I'm from because of linguistic barriers).
You don't need trillions of parameters to search the web or make an email formal. Super cheap models can do most of these tasks.
Arguably, we might be at the point where consumer hardware in the next couple years will run small agentic models that can serve most people's needs (Apple/Google on device AI, for example).
I used to believe the 200 USD subscription was really low pricing for the amount of tokens you used. But I've been recently building a monitoring system that creates a cost breakdown of all my token usage with OpenAI Pro subscription, and most of the tokens are cached during a heavy codex session. I can get to around 2-3 dollars per day of tokens.
Anthropic is a tiny bit more expensive, but not much. And most of the Claude Code usage is with Haiku models, and it uses Sonnet or Opus sparingly.
$2-3 a day over 20 or so working days is already $40, which is twice what the average user is paying per month - $20. I have used codex personally with the $20 subscription and I'm sure I've used way more tokens than that money would've gotten me even with caching. At work, we use API-priced claude code and when I see the cost dashboard it's significantly more expensive - I'd say by a margin of 2x or so - than if we paid for a subscription.
E: I just double checked this and I'd say the actual number is closer to 4x.
Sure, but the $20 subscription will be rate limited very fast with my usage, tried that already. I use codex a lot, but it might be I pay a bit too much for the $200 subscription...
I'm not sure about codex's API pricing, but it's very easy to run up $200 of usage in a month with claude code's API pricing, and that's without stressing it significantly. There's a reason why they have limits on even the $200 plan.
Right, but the subscription plans are in fact profitable, due to the rate limits. @pimeys is pointing out that with the rate limits of the $200/month plan they're only using ~$40/month in tokens. That's profitable for Anthropic and directly contradicts your claim that they're unprofitable on subscriptions.
(Of course there's no reason for them to be unprofitable on subscriptions vs API usage. They control the subscription rate limits, and have visibility on actual subscription usage, so they can just tune the rate limits to make them equally profitable to — or in this case, even more profitable than — API token costs.)
Right, but the subscription plans are in fact profitable, due to the rate limits.
Okay, give me a source. I at least have a few: my own usage, discussions online about the API pricing vs subscriptions, and external reporting. You're just saying they are.
@pimeys has given you a source — their own usage — and I've given you sources on how much the underlying token costs are here ($0.15/million tokens for R1; DeepSeek 3.2, which was just released and is equivalent to Opus, is reportedly even cheaper): https://inferencemax.semianalysis.com/
The Claude rate limits are public: 10-40 Sonnet messages per session, with a max of 8 sessions per week; and less for Opus, which is the DeepSeek-3.2-equivalent model. You can do the math.
So is this a failure because of the misalignment with LLMs real capabilities and their promise, or because of the price tag? Because AFAICT the 'LLMs-bad' are usually in the former group, which just seems confusing given the overwhelming evidence against such position.
I like LLMs and strongly believe they're useful, but I'm speaking to the GGP's point that hundreds of millions of people are using them. I don't think that necessarily proves that LLMs are a successful product technology, as their wide usage might be entirely due to the "promo" pricing.
To put it differently: I'm sure you'd love to have a personal assistant, and if I offered you one for $20/mo you'd jump at the opportunity. But it turns out your PA costs me $200/mo, and I'm just eating the $180/mo to get you hooked on having one. If I said you need to start paying me $300/mo to keep your PA in the future, would you still think you need one? I'd think not, and so the market for PAs would crash, because they're simply too expensive to be a consumer product. The market for LLMs, similarly, might not be "hundreds of millions" of people, but specialized cases where the cost is worth it, e.g. as a force multiplier in engineering.
But are they generating revenue for the company?
They're generating revenue, just not profits. And as much as I've beat the same drum for a long time, I'm starting to think that maybe the bears are thinking on the wrong timescale and are far too idealistic about the stands individuals are willing to take, and that could have mildly chlling consequences.
The interest from decision-rights holders to integrate them into just about everything hasn't really waned, and while I don't have data, I suspect is still accelerating. And then there is another aspect that isn't really talked about mcu in this context: as people use LLMs, do they become dependent upon them—and if so, how does that dependency look over time, are they conscious of it, do they see it as a problem, etc? Because there is a world in which that dependency curve supports the demand curve to the extent that LLMs reach some level of usefulness that isn't great but is nonzero, but are still indispensable to people who just need to get work done—and demand becomes far less price-elastic than people expect.
My guess is that factored into the calculation to lose money now is the bet skills atrophy on the part of their customers will turn into a form of lock-in, while political connections with amenable parties will lead to further regulatory capture and they'll finally have their very inefficient money printing machine.
Completely unquantified and likely completely off-base shower thought from moi. Would love to see something realistically refute it. But I haven't thought of a good argument against it yet.
From the article:
Though quantum computing in principle could give some leverage here
So the author doesn't understand quantum computing?
I would conjecture that this is another manifestation of the NP-completeness wall
So the author doesn't understand algorithmic complexity?
This article smells like BS to me.
It’s not the only thing they get wrong. Transformers models don’t have an internal state. And they are wrong about how transformers “solved scaling” because they can be trained in an “unsupervised manner” (it’s because it can be easily parallelized).
The article feels quite naive to say the least, as if it was written in 2023.
None of the technical criticisms matter actually. Because there are plenty people successfully using LLMs and Agents to do impactful things that were not possible before. Look at the people and business impact. That is the best indicator.
I think the spirit of this is correct, but the history of AI is confused. Some things I noticed:
I mean, transformers have tradeoffs. But, I wouldn't call where we're at a failure whatsoever.
Years ago, I was an unwilling witness in a patent suit between two large companies. I worked with an attorney to prepare how to testify to minimize my involvement. The attorney's advice was concise:
Most of the time, the questions relate to what was know when and by whom. For my testimony, most of the questions were around whatever was common knowledge at the time of the invention that was patented. I truthfully answered most of the questions by saying, "I don't know what other people knew, I only knew what I knew."
I've really internalized the delineation between what I know and what other people know. When I see something that's an opinion piece or expresses an opinion that's different than mind, I'm open to having my mind changed by people's different experiences.
I think a lot of the negative content tagged "vibecoding" is experience based. For a while, I read a lot the links because I wanted to challenge myself with experiences different from my own. After a while, I stopped, because (as I've noted in other threads), what I was seeing wasn't new. I decided to read this article because the headline leaves no room for interpretation.
I have read, enjoyed, and learned Taranis' other writings. Their recent data centers in space piece's headers enumerated physics based problems that need to be solved. In absolute terms, you can't change the laws of physics. I don't have any clever perspectives or experience that make me think I could solve those problems.
I have experiences lead me to judge LLMs as successful and, necessarily, reject the first claim of the headline. If it's helpful, I'll enumerate them. Facts, as presented (both in this article and my other readings) have not persuaded me otherwise. I try to challenge myself in good faith dan be genuine in these kind of discussions.
I don't know what would constitute an AI winter. If helpful, I'd put some thought to creating something testable.
LMK if either of my offers are of interest.
Just to be clear an AI winter doesn't mean that we throw the baby out with the bath-water. It simply means we've reached peak progression for this line of development in the AI space. It doesn't mean that nothing useful came out of it, it's not saying there aren't valid use cases for AI, and sure it makes sense for some people and some people have found it useful in specific domains and use-cases.
What it's ultimately conjecturing is that the bill has come due for the hype a bit, and that from a perspective of resource requirements and efficacy of the overall engineering approach to AI, we have hit the the limit or close to the limit in terms of AI progression on this line and we don't have anything lined up that will break that winter as of yet. That's what a winter is, baseline progression goes cold (it doesn't mean the technology dies), that doesn't mean that there won't be further innovation around the level that's been achieved, it just means that at a baseline, we can't really make these things any better and we should stop pretending that we can with the approach being used.
Personally I feel that AI is coasting a bit on the sentiment that it's an emerging technology. But we have very concrete studies on things like how long you can delay a page load before someone gets ticked off and leaves your site for as a slight tangent, an entrenched technology (web) where even small slow-downs or inconveniences will cause people to abandon your site or application out of frustration. Once the forgiveness phase of AI is over, effectively when the winter kicks in, and people don't see much progression in terms of the quality of the results, people will start to look harder at the results they're actually getting and the failure rates of the results and potentially start to walk away if in their specific domain, the failure rates are higher than they're comfortable dealing with.
I don't think the point here is to convince people to abandon AI, just to prepare for this line of AI / LLM development to stall out in terms of progression in the very near future, and if you're heavily invested or financially exposed to the AI segment of the market, to maybe think about limiting that exposure based on current approaches concrete, mathematical limitations.
Just to be clear an AI winter doesn't mean that we throw the baby out with the bath-water.
I really wish it did.
Once the forgiveness phase of AI is over, effectively when the winter kicks in, and people don't see much progression in terms of the quality of the results, people will start to look harder at the results they're actually getting and the failure rates of the results and potentially start to walk away if in their specific domain, the failure rates are higher than they're comfortable dealing with.
This is very well put and I agree. I hope that with this phase excesses such as replacing search or even browsers as a whole with a generative model will be thrown in the dustbin of history. Transformers/diffusion models are genuinely useful as a technology, they allow a novel form of extrapolation from data that was not possible before, however most use cases in the past couple of years have been assembled from an overly generous interpretation of their capabilities.
I think this might suffer a bit from programmer bias, where the expectation is that the models keep getting better until they can really fully substitute humans (seeming less likely as the months go on and the bill keeps tallying).
I think a better analogy might be deep leaning boom in the early 2010s, where we reached superhuman vision/classification performance and yet the plumbing (i.e. for self driving cars or better QA in factories) took years before being rolled out. LLMs aren't even being trained for information extraction (and, by default, perform much worse than small fined-tuned models) but I expect the real revolution will be around unstructured data. Contrary to the engineer's bias, the world does not run on APIs and optimized interchange protocols. It's a lot of Excel and Word and (maybe, if you're lucky) some Visual Basic.
The dream of SQL, natural language queries on huge unwieldy datasets, might finally be fulfilled.
the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked. Exactly why this works is still somewhat mysterious, though progress has been made.
This just isn’t true. Neural networks were being randomly initialized and errors backpropagated to train them before the sequence transformer architecture was even imagined.
This article strikes me as journalistic slop by someone who half understood what they researched.
It does… recently the web has became an echo chamber since everybody writes with GPT nowadays it seems.
That’s like looking at steam engines and doubting what you see…
Unusually enough I have academic degrees in both economics and computer science and I lived through the 2000s bubble and 2008 too.
The scenario is much different from 2000s, the rise in market cap is much more strongly backed by fundamentals — 2000s was much worse in terms of capital deployment and cash flows.
And I find ever more difficult to assert that LLMs are not useful at so many tasks.
On token economics, it has fallen so fast that last years best models are actually cheap now.
You're saying a $180 billion investment (if I'm adding correctly) in datacenters containing technology with a 3-year depreciation period (with luck) is backed by fundamentals?
Good question. But yes, compare to 2000s
We have cash flow now in many companies in the ecosystem. AI is already useful, whether it is intelligent or not
What I think can happen is instability because the economics of software changed too much too fast… it changed buy/build radically, so too many arrangements in firms that reflect the supply chain will now be very inefficient. Changing contracts at that level means companies will also change — consolidations, disruption, reconfiguration of supply chains.
And this is so different and so fundamentally tangible as a phenomenon… you can observe the time to complete so many tasks have dropped greatly. This means the production functions have changed.
180bn today is not that much again — evaporate 180bn from NVIDIA and it is still the greatest market cap ever. If you focus on 180bn, you miss the point. Compare it to other macro figures and look not only at the figure but at its impact. It’s not like 180bn is useless after 3 years. So in fact it’s much less like 180bn evaporating, it drops the importance to much less than 3% the total value generated…
It’s easy to quote a large figure like that, but, sincerely, it shows naïveté and a bit of a knee jerk reaction, an emotional stance that is common nowadays…
The actual impact seems to be very unclear at this point. If you have concrete evidence of such a huge change in production, I would be quite interested to see it.
Also, this $180B isn't just financial engineering "market cap", it's actual engineering buying actual physical objects — concrete, steel, silicon, power plants, etc. that cannot be repurposed for something else if this bet goes wrong. I've seen it compared to the investment in railroads, but railroads were useful for a century.
It isn't about whether LLMs are useful — they're obviously of greater than zero utility — it's about how useful.
Sure. DM me. I am actually starting to write on that.
From my standpoint, research has become much cheaper to do, not basic research but applied R&D. LLM is great at discovering and gluing together cutting edge solutions that were previously wrapped in very bad engineering.
So I’ve seen processes with lead times over a day dropping to seconds, and the impact was not cost reduction: it changed the product so much that it opened new revenues. In this case, it was the automation of a previously very time consuming analysis that their service provider gave 3 years time to deliver.
This is to me case where reducing the cost in a process went so far to change economics around it.
And on top of 180 in capex there’s 400 in cross holdings that none is such how to consolidate
So I’ve seen processes with lead times over a day dropping to seconds, and the impact was not cost reduction: it changed the product so much that it opened new revenues. In this case, it was the automation of a previously very time consuming analysis that their service provider gave 3 years time to deliver.
Assuming that the quality of the result is the same, what prevented this automation to happen years ago, before the ML and LLM craze?
That’s a very good question too… it also applies to ML in general. Neural networks are old theory right now, GPUs had existed for at least two decades… why did neural networks start to outperform SVM at some point?
I started using GPT in 3.5… I would use it for research and writing short snippets. Took me two weeks to extract a portion of an elixir phoenix server to a python Aws lambda function.
Now I did it under one hour. It is much faster now.
I saw milestones like: function definitions. Classes. Files. Now it’s getting close to full apps.
Maybe it’s has come to a level where it can feedback into developer tooling and improve the tooling itself so much that it stops depending on AI? I use my own testing framework now. I added more coverage criteria, model based testing, and I can use it across my projects. It sped things up!
Like using AI to bootstrap non AI tools… vibe code it until it stabilizes as a component.
Of course, this is not someone who’s read two python tutorials will be able to do — therein lies danger. But as I see it, it’s become ever safer to the prepared professional.
While it would be convenient for me if the AI winter came since I don't find doing LLM research interesting, I don't really buy the arguments OP makes. It's basically a rehash of stochastic parrot + hallucinations bad. They do not make a convincing enough argument that these combined mean that we cannot improve on existing AI designs. I also agree with other commenters that the NP-complete claim is strange and doesn't really make sense (I still need to write that blog post about how it doesn't make sense to throw up your hands just because your problems are NP-complete or even undecidable...).
Their major argument is about using AI for coding and I have a counterargument: make verifiers for program correctness. This is not easy, nor is it easy to specify verification conditions, but it is possible and there is active research in guiding LLMs using such constructions. I.e. the problem of "incorrect" code is being worked on by creating guardrails and verifiers to ensure that LLMs produce the right answer. If I were to take the badly-abused NP-completeness analogy and abuse it further, having a good verifier allows us to turn our machine from being in "P" (i.e. an LLM needs to never hallucinate and write the correct program on the first try) to being in "NP" (i.e. an LLM just needs to have the possibility of writing the correct program).
Note that all of this is nothing new, and in fact we can devise algorithms that, when given a verifier, are guaranteed to find a program satisfying it (should one exist). The issue is that most prior art involved literally enumerated all possible programs. Empirically, LLMs encode much stronger heuristics (to the point where there is no enumeration involved). So you just need to believe that these heuristics don't eliminate most good programs to believe that LLMs can be reliable for code gen.
I have been waiting for almost two years now to see LLMs make strides in theorem proving (e.g. Lean). Perhaps they already have and I haven't heard of it? But I have a hard time imagining that they somehow take off in regular programming (where checking correctness is not easy) before they take off in theorem proving.
It would be rich if the LLMs' shitty code is what finally propels formal verification into the realm of being doable by "blue-collar programmers". I wouldn't hold my breath though!
The old joke of the postdoc in Homotopy Type Theory cursing emacs because proof general was freezing again. Sometimes, worse really is better.
A blog post from Galois was posted to lobsters about that a few weeks ago. Just googling “LLMs and lean” turns up a bunch too.
Incidentally, if anyone has resources on how to check that a program and model match up, I’m all ears. I recently had LLMs make me some tla+ models but frankly it’s not clear if those models were really meaningful because that was my first contact with tla+
For some reason, this reminds me of the following joke:
A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
Knight turned the machine off and on.
The machine worked.
Please can the bubble pop already? I’m tired of LLMs, and more than that, I’m tired of the people who use them. The sooner everyone involved in running the grift loses their money, the better.
Hi, Theo. I’m Dan. I fall into the category of people who use them that you’re tired of. I’m not a grifter. Just an old coder who finds them useful. I’m not in a position where I’ll lose my money as a result of the success or failures of the AI companies. I’ve raised three kids. I volunteer in my community. I try to be a good person. I don’t think I’ve done anything that should automatically put me into a category of people you should feel negatively towards.
For the person who flagged my comment as spam, I submit that my response to Theo is a good faith attempt at an "intergroup contact theory" interaction. The family next door have drastically different political views. They're good people, raising good kids. I remind myself of them when I'm tempted to make sweeping statements about who supports whom politically and why. There's a lot of space - and a lot of people - between the extreme positions on AI. I think it'd be constructive if we don't choose a side and make people who disagree with us "other" and dismiss them out of hand.
My worry with LLMs is that attitudes like this are actually a very small but loud minority. That would imply that everybody is already using them effectively and I'm not ahead of the curve in any significant way. https://www.stratechi.com/adoption-curves/
LLMs are a miraculous technology. I would compare them to the discovery of compilers and high-level languages. From a technological point of view, the question is not whether they will succeed or fail. That already happened. LLMs work, they have succeeded. Saying that they could fail technically is like saying that the motor engine failed.
The only way they could fail is if they would be financially unviable. But I don't see how that could ever be: the technology is improving all the time, and they're already today providing a worth that's at least 10x more than what how they're priced. (Please don't tell the foundational AI companies that)
Perhaps the article was just ragebait.
LLMs are a miraculous technology.
To me, the only miraculous thing about LLMs is how many people seem to truly believe that they are useful. I think that people find them useful by a similar phenomenon than what makes us see faces in clouds.
I am expecting, after this hype is gone, that this episode will have taught us more about humans than about technology.
Claiming that they're not useful at all is either naive or an argument in bad faith. There are various tasks (especially in the realm of language processing) that were hard/impossible before LLMs and are now easy.
You could argue that the costs are not proportional to the benefits, but saying there are no benefits is clearly false. Note that this is also not the same as saying there are no downsides (there, also, clearly are).
That's not what OP said though, they said that people seem to truly believe that they are useful. Not that they aren't useful in any context whatsoever.
LLM providers and companies who benefit from the tech aren't asking us to evaluate LLMs from a language processing perspective. They aren't asking my grandmother if she would like some newfangled NLP or machine translation for her next startup.
Instead, they ask us to - as OP put it - "find faces in the clouds". Things like "automating email replies" or "generating music" or "vibecoding". They're telling people to "ask questions" of it even though it is full of garbage, see it as a dating partner, etc. They're using it to generate social media content to intermingle with ads so that you continue to watch. These things are not useful when they are a step removed from the human systems that require them.
In this analogy, we're the meteorologists that know properties of clouds, how they function, and the rain they provide as a benefit, and we can forecast them. However, there's a HUGE industry in trying to sell people cloud forecasting software and to make it more palatable, selling it as something to help people see faces in the clouds. Maybe entertaining, but not useful, and as we're seeing, highly costly in lots of other ways.
I highly doubt that the average joe is going to see quality of life improvements from LLMs the way an NLP engineer will.
the only miraculous thing about LLMs is how many people seem to truly believe that they are useful
Are lawyers and HRs useful?
The only way they could fail is if they would be financially unviable.
Are "compilers" a success or a failure?
Nobody seems to be making money nowadays by selling compilers.
That's because no-one thought of renting access to compilers by the month.
Now it's totally technically feasible. Do you think it's a viable business opportunity?
No, because a smaller proportion of code written nowadays is dependent on a standalone compiler.
Basically every "we give you a UI to your database and generate SQL" BI tool is, in part, a SaaS compiler.
Even if all cloud providers of LLMs collapse we’ll still be able to use local models with gpus.
During the last "AI winter" I could continue to run the classic "Eliza" at any time. Nobody claims that all the current demos will become unreadable or unusable, just that the novel applications for AI from transformers have peaked.