LLMs are cheap

27 points by jsnell

simonw

My favorite way to illustrate this point is to highlight how much it would cost to use a vision LLM to generate descriptions of all 70,000 photos in my personal photo library.

With Gemini 1.5 Flash 8B - the cheapest Gemini vision model - that cost for all 70,000 photos comes to approximate $1.70. That’s not a typo: it really would cost less than $2.

I’ve heard from someone I trust at Google that they aren’t operating their Gemini models at a net loss per prompt processed.

It’s rare for me to find any prompt that costs more than a cent to run against the models I frequently use. Most of the API prompts I run cost 1/10th of a cent or less.

I’ve published a bunch of notes on llm-pricing, and I also maintain this pricing calculator tool.

danthegoodman1

I think it’s totally there for personal use, but one place I don’t think it’s quite there yet for is scaled use (e.g. enthusaist or prosumer).

For example, if I wanted to use a reasonably intelligent model to do audio analysis (beyond what STT models can do like finding sarcasm, background noise, etc.) my cheapest option that works well is probably something on the order of Gemini 2.5 flash without thinking. If I was to want to analyze something like a 1M hour audio dataset to build audio intelligence models that cost is astronomical (32 tokens/sec audio input + output). Not to mention how much rate limiting you’d likely hit, so you then have to pay for the more “premium” tokens from Vertex AI API >.<. Not to say that other methods are much much cheaper, but it’s not like “universal accessibility for any use case” like something like pushing web APIs are now.

Edit: However I have found with rough calculationst that gemini flash 2.5 is about half the price of deepgram nova 3, ignoring output tokens (which isn’t totally unfair considering with deepgram you’re paying for silent audio, but with gemini you’re not paying output tokens for silent audio)
- simonw
  
  Audio can be very cheap too. Gemini 1.5 Flash 8B is good enough for basic transcription and costs $0.075 per million input tokens. A million input tokens at 32 tokens per second is enough for 31,250 seconds of audio which is about 8.5 hours. So that’s 8.5 hours of transcription for about 10 cents (accounting for output token costs as well).
  - dkasper
    
    I think the key point is that folks assume they need to use frontier models when using an older much cheaper model might be good enough for many tasks.
- lproven
  
  As I said in a comment on that post – to no reply at all…
  
  There is more to cost, and cheapness, than the direct financial cost.
  
  The ecological cost is far more important.
  
  Arguably, the impacts in employment, careers, theft of creative works, and other damage of inexpensive LLM bots are in the short term, and terms of impacts on mere human lives, more imminently and pressingly so.
  - simonw
    
    I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
    
    If you’re concerned about the environmental impact of these things (and everyone should be) drops in price should be a cause for celebration, because they almost always directly reflect the models themselves becoming more efficient to run.
    
    datarama
    
    Unfortunately, those efficiency gains are likely to make things worse in the global view, courtesy of Jevons’ Paradox. The more efficient these things get, the more potential use cases become viable, so the more they get used - so we’ll very likely end up using more power and water for them in total.
    
    Footnote: Since Jevons’ original observation about coal-fired steam engines is a bit hard to relate to, my favourite modernized example for people who aren’t software nerds is display technology. Old CRT screens were horribly inefficient - they were large, clunky and absolutely guzzled power. Modern LCDs and OLEDs are slim, flat and use much less power, so that seems great … except we’re now using powered screens in a lot of contexts that would be unthinkable in the CRT era. If I visit the local fast food joint, there’s a row of large LCD monitors, most of which simply display static price lists and pictures of food. 20 years ago, those would have been paper posters or cardboard signage. The large ads in the urban scenery now are huge RGB LED displays (with whirring cooling fans); just 5 years ago they were large posters behind plexiglass. Bus stops have very large LCDs that display a route map and timetable which only changes twice a year - just two years ago, they were paper. Our displays are much more power-efficient than they’ve ever been, but at the same time we’re using much more power on displays than ever.
    
    simonw
    
    That’s a fantastic modern twist on Jevons’, thanks for sharing.
    
    gcupc
    
    That’s an even better example than the usual one about adding lanes to a congested highway, because the results are so much clearer. People will simply not believe you when you tell them adding lanes will not solve the traffic problem.
    
    minimax
    
    … and think of the e-waste! Yikes.
    
    gcupc
    
    I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
    
    Just to clarify: you don’t believe power generation has externalities that are not reflected in the cost? And you also don’t believe that any major AI services are being run at a loss to gain market share?
    
    simonw
    
    you don’t believe power generation has externalities that are not reflected in the cost?
    
    That’s not what I said, and it’s not what I believe.
    
    I believe there is a direct correlation between the cost of the APIs and the amount of energy being used for inference to serve those prompts.
    
    Sources I trust have told me that neither Gemini nor AWS Nova run prompts at a loss in terms of energy used to serve them, and those are the cheapest large vendors right now.
    
    OpenAI just dropped the cost of their o3 model by 80% and credited “engineers optimizing inferencing”. I believe them.
    
    Obviously all of these companies are operating at gigantic losses right now when you include research and staffing and training and marketing and their generous free tiers.
    
    I do not believe they lose money on the API calls that they actually charge for.
    
    lproven
    
    I’m confident there is a direct correlation between the ecological cost and the price that models charge for inference.
    
    That is very naïve. There isn’t even a direct link between what it costs LLM vendors to run services and what it costs to use them.
    
    simonw
    
    What makes you so confident there?
    
    lproven
    
    You may not remember my name, but I remember yours, and we’ve not merely talked on Lobsters before but I also read your blogs, and indeed, see you cited as a proponent and advocate of LLM bot usage.
    
    As one of the tiny handful of total LLM-bot-skeptics, I am very familiar with your arguments going back over a year now, and what I said is entirely consistent with my general take of your position.
    
    ALL the LLM-bot vendors are operating at stupendous losses.
    
    The clearest commentator I know on this is Ed Zitron.
    
    Examples:
    
    https://www.wheresyoured.at/core-incompetency/
    
    https://www.wheresyoured.at/openai-is-a-systemic-risk-to-the-tech-industry-2/
    
    jsnell
    
    OpenAI’s expenses are quite small compared to their user-base. The only reason they’re making a loss is that they’re giving away so much free inference to so many consumers.
    
    But because inference is so cheap (really!), the level of monetization they would need to be profitable is tiny. Like, we’re not talking of every user needing to buy a $20/month subscription. We’re talking about ads shown to their free users getting 1/10th of the rates of Facebook or Google.
    
    Getting your information from people like Zitron will just lead to getting blindsided repeatedly. The entire industry isn’t about to collapse any moment now, it’s just a fantasy made up by Zitron because that’s what his readership wants to read.
    
    lproven
    
    [[citation needed]]
    
    jsnell
    
    Which part? Do you accept that OpenAI lost $5B in 2024? Do That they have 500M weekly active users? That they do not monetize most of those users in any way? That Google’s search ad revenue is 200B, Meta’s is 160B?
    
    Like, I can get you a reference on any of those, but it will be pretty tedious if you don’t tell which of these facts you don’t believe.
    
    But from those facts it is simple math. The exact cost structure doesn’t even matter, because all the costs are already included in that 5B loss
    
    lproven
    
    As usual on Lobsters, when one criticises any aspect of AI – doesn’t matter what – the boosters start questioning one’s methods of enumerating angels on pins and the dance moves they are pulling.
    
    There are no angels. Angels do not exist, and neither does the “AI” in “generative AI”.
    
    But as far as I can discern, this is now a religious position for the supporters, just as it was for cryptocurrency enthusiasts a few years ago.
    
    Any attempt to puncture the belief is met with whataboutism.
    
    True statements which many people simply cannot accept:
    
    There are no gods and there never were. All religions are false.
    
    All complementary medicine is fake. If it’s alternative that means it’s been proved not to work.
    
    All applications of blockchains are scams to defraud the gullible out of money.
    
    All current “generative AI” is a scam with no more intelligence than a stapler and the current investment is a doomed bubble.
    
    Billions of people can be wrong and are every day. International movements built on lies have persisted for millennia. Similarly vast cash injections are not evidence of validity.
    
    jsnell
    
    This seems to have literally nothing to do with what I wrote? Which was an answer to you asking for a citation. If you weren’t interested in an answer, why waste both our times by asking?
    
    Can’t argue with the numbers or the math? Time to bust out some “religious” copy-pasta.
    
    lproven
    
    No, it is not that I can’t argue with the numbers, it is that I think your numbers are completely fabricated.
    
    I am not saying that you fabricated them. I am sure you have sources for them and I am sure you trust those sources.
    
    I do not trust those numbers or wherever they came from. I remain convinced that 100% of “generative AI” is a total scam, running at vast losses, and has generated a bubble of corporate valuation. That valuation is factored into your numbers, because you believe in this stuff: this is apparent from the fact that you are arguing in favour of it.
    
    I do think that Ed Zitron and David Gerard’s assessments are correct and I trust their numbers and their judgement, because it matches my own.
    
    There is no point in even attempting to argue with numbers founded on false premises. You want to debate the valuations of things I contend do not in fact exist and thus have no real value.
    
    That is what I refer to as counting angels dancing on pinheads. This is a famous and celebrated 400 year old mockery of theological debate.
    
    jsnell
    
    So you’re going to ask for references, and when given them, just say with no specifics that they must be fradulent? To remind you, these are all the numbers needed answer your question:
    
    Do you accept that OpenAI lost $5B in 2024? That they have 500M weekly active users? That they do not monetize most of those users in any way? That Google’s search ad revenue is 200B, Meta’s is 160B?
    
    That’s not a lot,. And you don’t even have the decency to say which one those handful of numbers you don’t believe? It really does not feel like you are discussing this in good faith, or figuring out the truth.
    
    lproven
    
    I don’t know why your replies do not show up in my Lobsters inbox, but they don’t.
    
    You think I am not playing fair. I think you’re not.
    
    You mock Zitron’s work. I think he is perhaps the #2 most incisive commentator on LLMs on the internet, after David Gerard’s Pivot to AI.
    
    Given your responses here, and your total failure to even engage with the externalities – the ecological costs are way more important than the money – I think we are not playing the same game.
    
    jsnell
    
    Oh, huh. Your reply also took days to show up for me in the Inbox. Maybe Lobsters has some kind of flamewar-prevention delay for this kind of deeply nested thread?
    
    I didn’t engage with the ecological cost argument, just like I wouldn’t have engaged with “LLMs are not useful” arguments, “LLMs are theft”, “LLMs are too dangerous”, “LLMs are a capitalist plot”, and all the myriad other reasons people don’t like AI, because then you end up having all of those discussions at once, and the moment somebody senses they’re losing an argument they can just pivot off to another grievance so that nothing can actually be closed off.
    
    I’m playing the game of getting people to agree about the facts on the ground; this seemed like a good place to start, since it’s obviously and objectively true but a lot of people believe the opposite. But it’s kind of hard to have that discussion when you just go “nah, I don’t believe the numbers, but won’t tell which number and why I don’t believe it”. Like, sorry for not flinging around graphic insults and just confirming your beliefs like Zitron does, but having your beliefs confirmed when the beliefs are objectively false is a bad thing for you.
    
    lproven
    
    Two points only here. Trying to keep it brief.
    
    Trying to separate out these aspects of the same question is disingenuous. Mere bare financials without considering the bigger picture of partner companies and their costs is misrepresentation. Not considering the very real and much more important ecological costs is dishonest. It’s not all about “small green pieces of paper”. But when considering the whole one has to ask is it worth the cost? and is what we get worth the impact on global warming? For me, the answer is a huge Burj-al-Khalifa high NO! in letters of phosgene fire.
    
    If you disbelieve what I think is a fair assessment then falsify it. That is how scientific discourse proceeds: show that it is wrong. I have not seen you do this. But because of point #1 I am not really interested.
    
    jsnell
    
    So, in https://lobste.rs/c/w0conk I made a fairly tightly scoped claim. You replied with just a “[[citation needed]]”.
    
    I did not know which part you needed a citation for. And despite me asking multiple times, you refuse to clarify. I even gave (in https://lobste.rs/c/fbmyri) a complete list of the inputs needed for that calculation. You just said that my numbers were completely fabricated, without saying which one.
    
    How am I supposed to falsify “citation needed” or “the numbers are fabricated” or “there is no point in attempting to argue with numbers”? I’m making concrete claims with data. You’re dismissing the data as an article of faith.
    
    So I can only assume you’re talking about some totally different claim that I need to be falsifying. But as far as I’m concerend this subthread has been about https://lobste.rs/c/w0conk, and I’m not interested in starting a discussion on some other unspecified topic without getting this to a closure.
    
    simonw
    
    I won’t deny for a second that most[1] of these vendors are running at huge losses. They are spending billions of dollars on staffing and research and training costs.
    
    That doesn’t mean that they are selling individual prompts for less than the cost of the energy needed to serve them.
    
    I’ve heard from sources I trust that neither Google Gemini nor AWS Nova (the two cheapest mainstream providers) lose money on inference. Sure, they’re not covering their enormous research costs, but they aren’t selling execution of a prompt for less than the unit cost of executing that prompt.
    
    I don’t have any insider information on that for OpenAI or Anthropic but, given that they charge more than Gemini and Nova, my educated guess is that they’re not selling prompt execution for less than it costs them on a per-unit basis either.
    
    [1] I say most here because there are some vendors which I expect could be operating at a profit: the vendors that sell access to open weight models trained by other people. There’s a booming industry of dedicated API providers running different Llama models right now. They’re fiercely competitive on price and desperately grabbing for market share from each other so it may well be they all operate at an overall loss too, but that’s not guaranteed.
    
    aiono
    
    The LLM API prices must be subsidized to grab market share – i.e. the prices might be low, but the costs are high - I don’t think they are, for a few reasons. I’d instead assume APIs are typically profitable on a unit basis. I have not found any credible analysis suggesting otherwise.
    
    Well I don’t know about Plus users but Sam Altman told that $200/month is making them lose money due to video generation https://www.heise.de/en/news/Crazy-OpenAI-makes-losses-with-ChatGPT-Pro-10227139.html Of course that doesn’t mean Plus is not profitable as well.
    
    But OpenAI made a loss, and they don’t expect to make profit for years! - That’s because a huge proportion of their usage is not monetized at all, despite the usage pattern being ideal for it. OpenAI reportedly made a loss of $5B in 2024. They also reportedly have 500M MAUs. To reach break-even, they’d just need to monetize those free users for an average of $10/year, or $1/month. A $1 ARPU for a service like this would be pitifully low.
    
    That seems very optimistic, to earn $5B with $10/year you need 500 million paying users. It looks like that would be more than all weekly active users https://www.reuters.com/technology/artificial-intelligence/openais-weekly-active-users-surpass-400-million-2025-02-20/ . That’s ignoring any taxes or cuts on the payment of course. Then it needs to also be profitable not just break even. Considering there are only 20 million plus users, convincing ~500 million more to pay for a product won’t be that easy I believe. Yes it’s really cheap if you live in western countries, but I doubt all users live there. For some $10/year is something of a significant expense.
    
    jsnell
    
    The article is specifically about the cost structure of APIs, not consumer subscriptions. Those are indeed more likely to be sold at a loss currently, due to the adverse selection effects that all-you-can-eat plans will fundamentally have. The market dynamics that make selling to consumers at a loss don’t apply to pay-per-token plans though, so they’re a good way to figure out the ceiling on what the LLM inference can cost to actually produce.
    
    But note that there’s a lot of other ways to estimate the inference costs, and they all arrive at results in a similar range.
    
    That seems very optimistic, to earn $5B with $10/year you need 500 million paying users.
    
    No, you just need 500M monetized users. Google’s search revenue is $200B/year, and nobody is paying for search. YouTube’s advertising revenue is $35B/year (that is just ads, to people who aren’t subscribing to YT Premium). Facebook’s various services make a combined $160B/year from ads. And they have billions of users, maybe 5x what OpenAI has, not tens of billions.
    
    It feels like you’ve bought into the narrative where LLMs are so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. But as the article shows, it’s already the case that LLMs already way cheaper unit costs than services that can be wildly profitable when monetized only by ads.
    
    minimax
    
    When we think “monetized by ads” in this context, we should be thinking about Golden Gate Claude, not blockable web ads.
    
    aiono
    
    Okay thanks for the clarification, I didn’t think about monetization by other means that subscription when I wrote my comment. Yeah that is plausible if you integrate ads.
    
    simonw
    
    People put way too much weight on that one hype-filled Sam Altman tweet from back in January, just a week or two after they added their $200/month plan.
    
    I expect OpenAI’s revenue from personal users to eventually be dwarfed by its revenue from companies, both buying access for their staff and paying for API usage.
    
    sanxiyn
    
    I disagree. Currently OpenAI’s revenue is (unlike all their competitors) dominated by B2C, and I think this will continue, and it is their strategy for this to continue. There are obvious ways to do this, like ads and AI companions.
    
    simonw
    
    Their B2B revenue is growing. This story from yesterday said:
    
    The company announced earlier this month that it has three million paying business users, up from the two million it reported in February.
    
    minimax
    
    They’re pushing hard on B2B, and they have some help, too. From the ferd.ca story posted here recently:
    
    investors in the industry already have divided up companies in two categories, pre-AI and post-AI, and they are asking “what are you going to do to not be beaten by the post-AI companies?” […] Adoption may be forced to keep board members, investors, and analysts happy, regardless of what customers may be needing.
    
    This is what “too big to fail” looks like before it gets to the taypayer-funded-bailout stage.
    
    gerikson
    
    I’m getting the feeling that many AI companies are eyeing the US government as the next sugar daddy to get their financial fix from, specifically national security / defense which has less cost control built in.
    
    sanxiyn
    
    My understanding is while both their B2C and B2B revenue is growing, their B2C revenue is growing faster. They are dominant in B2C, but have real competition in B2B, like Anthropic.
    
    simonw
    
    New story today: OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth.
    
    The figure includes sales from the company’s consumer products, ChatGPT business products and its application programming interface, or API. […]
    
    The company announced earlier this month that it has 3 million paying business users, up from the 2 million it reported in February.
    
    zetashift
    
    A nice report to read honestly, as somebody who is not well versed in how these services price their APIs and I don’t not use LLMs in general.
    
    Since most of the stuff I personally see these days involve Cursor/Windsurf/aider/agentic stuff, where people send/analyze whole codebases, does the same token per query efficiency reasoning apply there as well?
    
    jsnell
    
    For uses like that the token counts would be much higher, but also search would not be an appropriate comparison, and the latter is a problem. Most uses of LLMs for coding have no “classical” alternative available at all for us to compare to.
    
    (But yes, at least I find those uses to still be absurdly cheap. Feeding my entire app’s source code to Gemini Flash and getting it to do an architecture review + detailed refactoring plan to fix the worst deficiencies will cost like a cent.)
    
    agocke
    
    Using Claude code I can pretty easily run up a bill of $1/hour. This is starting to become real money.
    
    square_usual
    
    $1/hr? That’s cheap for Claude Code! I’ve had <30m sessions that took $5 in tokens.