LLMs are cheap
24 points by jsnell
24 points by jsnell
My favorite way to illustrate this point is to highlight how much it would cost to use a vision LLM to generate descriptions of all 70,000 photos in my personal photo library.
With Gemini 1.5 Flash 8B - the cheapest Gemini vision model - that cost for all 70,000 photos comes to approximate $1.70. That’s not a typo: it really would cost less than $2.
I’ve heard from someone I trust at Google that they aren’t operating their Gemini models at a net loss per prompt processed.
It’s rare for me to find any prompt that costs more than a cent to run against the models I frequently use. Most of the API prompts I run cost 1/10th of a cent or less.
I’ve published a bunch of notes on llm-pricing, and I also maintain this pricing calculator tool.
I think it’s totally there for personal use, but one place I don’t think it’s quite there yet for is scaled use (e.g. enthusaist or prosumer).
For example, if I wanted to use a reasonably intelligent model to do audio analysis (beyond what STT models can do like finding sarcasm, background noise, etc.) my cheapest option that works well is probably something on the order of Gemini 2.5 flash without thinking. If I was to want to analyze something like a 1M hour audio dataset to build audio intelligence models that cost is astronomical (32 tokens/sec audio input + output). Not to mention how much rate limiting you’d likely hit, so you then have to pay for the more “premium” tokens from Vertex AI API >.<. Not to say that other methods are much much cheaper, but it’s not like “universal accessibility for any use case” like something like pushing web APIs are now.
Edit: However I have found with rough calculationst that gemini flash 2.5 is about half the price of deepgram nova 3, ignoring output tokens (which isn’t totally unfair considering with deepgram you’re paying for silent audio, but with gemini you’re not paying output tokens for silent audio)
Audio can be very cheap too. Gemini 1.5 Flash 8B is good enough for basic transcription and costs $0.075 per million input tokens. A million input tokens at 32 tokens per second is enough for 31,250 seconds of audio which is about 8.5 hours. So that’s 8.5 hours of transcription for about 10 cents (accounting for output token costs as well).
I think the key point is that folks assume they need to use frontier models when using an older much cheaper model might be good enough for many tasks.
The LLM API prices must be subsidized to grab market share – i.e. the prices might be low, but the costs are high - I don’t think they are, for a few reasons. I’d instead assume APIs are typically profitable on a unit basis. I have not found any credible analysis suggesting otherwise.
Well I don’t know about Plus users but Sam Altman told that $200/month is making them lose money due to video generation https://www.heise.de/en/news/Crazy-OpenAI-makes-losses-with-ChatGPT-Pro-10227139.html Of course that doesn’t mean Plus is not profitable as well.
But OpenAI made a loss, and they don’t expect to make profit for years! - That’s because a huge proportion of their usage is not monetized at all, despite the usage pattern being ideal for it. OpenAI reportedly made a loss of $5B in 2024. They also reportedly have 500M MAUs. To reach break-even, they’d just need to monetize those free users for an average of $10/year, or $1/month. A $1 ARPU for a service like this would be pitifully low.
That seems very optimistic, to earn $5B with $10/year you need 500 million paying users. It looks like that would be more than all weekly active users https://www.reuters.com/technology/artificial-intelligence/openais-weekly-active-users-surpass-400-million-2025-02-20/ . That’s ignoring any taxes or cuts on the payment of course. Then it needs to also be profitable not just break even. Considering there are only 20 million plus users, convincing ~500 million more to pay for a product won’t be that easy I believe. Yes it’s really cheap if you live in western countries, but I doubt all users live there. For some $10/year is something of a significant expense.
The article is specifically about the cost structure of APIs, not consumer subscriptions. Those are indeed more likely to be sold at a loss currently, due to the adverse selection effects that all-you-can-eat plans will fundamentally have. The market dynamics that make selling to consumers at a loss don’t apply to pay-per-token plans though, so they’re a good way to figure out the ceiling on what the LLM inference can cost to actually produce.
But note that there’s a lot of other ways to estimate the inference costs, and they all arrive at results in a similar range.
That seems very optimistic, to earn $5B with $10/year you need 500 million paying users.
No, you just need 500M monetized users. Google’s search revenue is $200B/year, and nobody is paying for search. YouTube’s advertising revenue is $35B/year (that is just ads, to people who aren’t subscribing to YT Premium). Facebook’s various services make a combined $160B/year from ads. And they have billions of users, maybe 5x what OpenAI has, not tens of billions.
It feels like you’ve bought into the narrative where LLMs are so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. But as the article shows, it’s already the case that LLMs already way cheaper unit costs than services that can be wildly profitable when monetized only by ads.
Okay thanks for the clarification, I didn’t think about monetization by other means that subscription when I wrote my comment. Yeah that is plausible if you integrate ads.
When we think “monetized by ads” in this context, we should be thinking about Golden Gate Claude, not blockable web ads.
People put way too much weight on that one hype-filled Sam Altman tweet from back in January, just a week or two after they added their $200/month plan.
I expect OpenAI’s revenue from personal users to eventually be dwarfed by its revenue from companies, both buying access for their staff and paying for API usage.
I disagree. Currently OpenAI’s revenue is (unlike all their competitors) dominated by B2C, and I think this will continue, and it is their strategy for this to continue. There are obvious ways to do this, like ads and AI companions.
Their B2B revenue is growing. This story from yesterday said:
The company announced earlier this month that it has three million paying business users, up from the two million it reported in February.
They’re pushing hard on B2B, and they have some help, too. From the ferd.ca story posted here recently:
investors in the industry already have divided up companies in two categories, pre-AI and post-AI, and they are asking “what are you going to do to not be beaten by the post-AI companies?” […] Adoption may be forced to keep board members, investors, and analysts happy, regardless of what customers may be needing.
This is what “too big to fail” looks like before it gets to the taypayer-funded-bailout stage.
I’m getting the feeling that many AI companies are eyeing the US government as the next sugar daddy to get their financial fix from, specifically national security / defense which has less cost control built in.
My understanding is while both their B2C and B2B revenue is growing, their B2C revenue is growing faster. They are dominant in B2C, but have real competition in B2B, like Anthropic.
New story today: OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth.
The figure includes sales from the company’s consumer products, ChatGPT business products and its application programming interface, or API. […]
The company announced earlier this month that it has 3 million paying business users, up from the 2 million it reported in February.
As I said in a comment on that post – to no reply at all…
There is more to cost, and cheapness, than the direct financial cost.
The ecological cost is far more important.
Arguably, the impacts in employment, careers, theft of creative works, and other damage of inexpensive LLM bots are in the short term, and terms of impacts on mere human lives, more imminently and pressingly so.
I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
If you’re concerned about the environmental impact of these things (and everyone should be) drops in price should be a cause for celebration, because they almost always directly reflect the models themselves becoming more efficient to run.
Unfortunately, those efficiency gains are likely to make things worse in the global view, courtesy of Jevons’ Paradox. The more efficient these things get, the more potential use cases become viable, so the more they get used - so we’ll very likely end up using more power and water for them in total.
Footnote: Since Jevons’ original observation about coal-fired steam engines is a bit hard to relate to, my favourite modernized example for people who aren’t software nerds is display technology. Old CRT screens were horribly inefficient - they were large, clunky and absolutely guzzled power. Modern LCDs and OLEDs are slim, flat and use much less power, so that seems great … except we’re now using powered screens in a lot of contexts that would be unthinkable in the CRT era. If I visit the local fast food joint, there’s a row of large LCD monitors, most of which simply display static price lists and pictures of food. 20 years ago, those would have been paper posters or cardboard signage. The large ads in the urban scenery now are huge RGB LED displays (with whirring cooling fans); just 5 years ago they were large posters behind plexiglass. Bus stops have very large LCDs that display a route map and timetable which only changes twice a year - just two years ago, they were paper. Our displays are much more power-efficient than they’ve ever been, but at the same time we’re using much more power on displays than ever.
That’s an even better example than the usual one about adding lanes to a congested highway, because the results are so much clearer. People will simply not believe you when you tell them adding lanes will not solve the traffic problem.
I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
Just to clarify: you don’t believe power generation has externalities that are not reflected in the cost? And you also don’t believe that any major AI services are being run at a loss to gain market share?
you don’t believe power generation has externalities that are not reflected in the cost?
That’s not what I said, and it’s not what I believe.
I believe there is a direct correlation between the cost of the APIs and the amount of energy being used for inference to serve those prompts.
Sources I trust have told me that neither Gemini nor AWS Nova run prompts at a loss in terms of energy used to serve them, and those are the cheapest large vendors right now.
OpenAI just dropped the cost of their o3 model by 80% and credited “engineers optimizing inferencing”. I believe them.
Obviously all of these companies are operating at gigantic losses right now when you include research and staffing and training and marketing and their generous free tiers.
I do not believe they lose money on the API calls that they actually charge for.
I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
That is very naïve. There isn’t even a direct link between what it costs LLM vendors to run services and what it costs to use them.
What makes you so confident there?
You may not remember my name, but I remember yours, and we’ve not merely talked on Lobsters before but I also read your blogs, and indeed, see you cited as a proponent and advocate of LLM bot usage.
As one of the tiny handful of total LLM-bot-skeptics, I am very familiar with your arguments going back over a year now, and what I said is entirely consistent with my general take of your position.
ALL the LLM-bot vendors are operating at stupendous losses.
The clearest commentator I know on this is Ed Zitron.
Examples:
https://www.wheresyoured.at/core-incompetency/
https://www.wheresyoured.at/openai-is-a-systemic-risk-to-the-tech-industry-2/
I won’t deny for a second that most[1] of these vendors are running at huge losses. They are spending billions of dollars on staffing and research and training costs.
That doesn’t mean that they are selling individual prompts for less than the cost of the energy needed to serve them.
I’ve heard from sources I trust that neither Google Gemini nor AWS Nova (the two cheapest mainstream providers) lose money on inference. Sure, they’re not covering their enormous research costs, but they aren’t selling execution of a prompt for less than the unit cost of executing that prompt.
I don’t have any insider information on that for OpenAI or Anthropic but, given that they charge more than Gemini and Nova, my educated guess is that they’re not selling prompt execution for less than it costs them on a per-unit basis either.
[1] I say most here because there are some vendors which I expect could be operating at a profit: the vendors that sell access to open weight models trained by other people. There’s a booming industry of dedicated API providers running different Llama models right now. They’re fiercely competitive on price and desperately grabbing for market share from each other so it may well be they all operate at an overall loss too, but that’s not guaranteed.
OpenAI’s expenses are quite small compared to their user-base. The only reason they’re making a loss is that they’re giving away so much free inference to so many consumers.
But because inference is so cheap (really!), the level of monetization they would need to be profitable is tiny. Like, we’re not talking of every user needing to buy a $20/month subscription. We’re talking about ads shown to their free users getting 1/10th of the rates of Facebook or Google.
Getting your information from people like Zitron will just lead to getting blindsided repeatedly. The entire industry isn’t about to collapse any moment now, it’s just a fantasy made up by Zitron because that’s what his readership wants to read.
[[citation needed]]
Which part? Do you accept that OpenAI lost $5B in 2024? Do That they have 500M weekly active users? That they do not monetize most of those users in any way? That Google’s search ad revenue is 200B, Meta’s is 160B?
Like, I can get you a reference on any of those, but it will be pretty tedious if you don’t tell which of these facts you don’t believe.
But from those facts it is simple math. The exact cost structure doesn’t even matter, because all the costs are already included in that 5B loss
As usual on Lobsters, when one criticises any aspect of AI – doesn’t matter what – the boosters start questioning one’s methods of enumerating angels on pins and the dance moves they are pulling.
There are no angels. Angels do not exist, and neither does the “AI” in “generative AI”.
But as far as I can discern, this is now a religious position for the supporters, just as it was for cryptocurrency enthusiasts a few years ago.
Any attempt to puncture the belief is met with whataboutism.
True statements which many people simply cannot accept:
Billions of people can be wrong and are every day. International movements built on lies have persisted for millennia. Similarly vast cash injections are not evidence of validity.
A nice report to read honestly, as somebody who is not well versed in how these services price their APIs and I don’t not use LLMs in general.
Since most of the stuff I personally see these days involve Cursor/Windsurf/aider/agentic stuff, where people send/analyze whole codebases, does the same token per query efficiency reasoning apply there as well?
For uses like that the token counts would be much higher, but also search would not be an appropriate comparison, and the latter is a problem. Most uses of LLMs for coding have no “classical” alternative available at all for us to compare to.
(But yes, at least I find those uses to still be absurdly cheap. Feeding my entire app’s source code to Gemini Flash and getting it to do an architecture review + detailed refactoring plan to fix the worst deficiencies will cost like a cent.)
Using Claude code I can pretty easily run up a bill of $1/hour. This is starting to become real money.