Anthropic blocks third-party tools using Claude Code OAuth tokens
33 points by adthal
33 points by adthal
I liked Armin's take on this:
Given how Anthropic's API actually works, the underlying unit economics are very dependent on how effective cache utilization is or how the agent, the harness actually drives the loop. So I can totally imagine that when they look at their dashboards, they see wildly differing unit economics from Claude Code where they have a lot of control over it vs what other harnesses are doing.
I am having difficulty buying this. As a disclaimer I have never used Claude Code because I can't afford it. The cost difference between the Pro and Max plans for agentic coding cannot be explained by just the caching aspect, right? To me it looks like vendor lock-in and I can't complain against it because it's Anthropic's product and it's up to them to decide the terms of use. But if something like caching was the barrier in allowing their model's open use then they should be transparent about it.
I haven't worked at these LLM companies but I have worked on a lot of cloud services before, and I think there is a widely spread misconception that any of these pricing plans closely correspond to the underlying costs.
It seems impossible for the pricing of either the API, or the Claude Code product, to closely correspond to Anthropic's underlying costs. Most of their cost is going to be per-cluster, not per-token. Like a cluster being 70% utilitized is going to cost nearly as much as a cluster being 80% utilized. Additional tokens will be very cheap for Anthropic at off times and very expensive at peak times. Also, this is going to happen in a way that is hard to predict, for both Anthropic and external users. It isn't easy to produce a "market rate" for tokens, and it isn't easy to buy them that way either.
The most logical way to price any cloud service like this is to try to get customers to commit to a fixed, consistent amount of usage. This is why AWS wants you to buy reserved instances so badly - because they spin up new capacity on units of clusters, not on units of VM-hours.
So naturally Anthropic would love to sell its product in a unit of "you commit to buy 1/10,000th of this cluster". Unfortunately this is completely incomprehensible from the outside so they can't do it. Selling the Claude Code product per-seat is kind of an approximation to that. But a "Pro" product that people are always using at 2 pm eastern time could very well end up being just about as expensive as a "Max" product that people are always using at 2 pm eastern time.
That's a really useful observation, thanks.
Cluster utilization helps explain the batch pricing discount offered by all three of OpenAI, Gemini and Anthropic. They tend to offer a 50% discount in exchange for your batch requests taking up to an hour or longer to be processed.
Where do you initiate batch requests for these providers? I've not seen this before.
For Google what I’ve seen is this page: https://ai.google.dev/gemini-api/docs/batch-api
This is where I first heard of the concept, I’m not sure how to do it with the other providers.
I wonder how much is about load shifting. The other aspect of your example is that increasing utilization at off hours costs adds relatively little to the overall cost. So if the max accounts increase the overall usage limit more than the 5 hour limit they will move the extra usage out of US business hours.
Anthropic have three main paid plans: $20/month, $100/month and $200/month. The $100/month one gets you 5x usage of the $20/month one, but the $200/month one gets you 20x usage - so it's 4x more cost efficient (for heavy users) than the $100 month one would be.
You can attach Claude Code to your Anthropic plan, at which point you are subject to their five-hourly and weekly usage limits. Here's what /usage in Claude Code looks like for me right now (my five-hourly limit just reset):
Current session
0% used
Resets 3:59pm (America/Los_Angeles)
Current week (all models)
█████████████████████ 42% used
Resets Jan 14, 11:59pm (America/Los_Angeles)
You can also sign up for an API key and use Claude Code with that instead, at which point you will be billed for your exact token usage.
For heavy users, pay-via-API ends up a whole lot more expensive than the $200/month plan.
There's an unofficial tool you can run using npx ccusage@latest which attempts to calculate your token cost based on your local logs if you were to have used the API instead. Mine estimates that just on last Friday I would have spent $54.10 running Claude Code if I had used the API - but those numbers don't include Claude Code for Web and I'm a very heavy user of that tool.
If you want to try Claude Code out I have some week long guest passes I haven't used yet, message me on Lobste.rs and I can send you one.
For heavy users, pay-via-API ends up a whole lot more expensive than the $200/month plan.
To put some numbers on this, I'm on the $200/month plan, and ccusage reported that my first week (where I hit the cap) would have been $1440.73 had I been paying for it via API calls.
Does CC allow some kind of metered billing when you hit the cap on that top plan? And if so, is that still better than the API-based pricing?
Yes they call it "extra usage" and you can choose to switch to API rates (or else stop using it for a few hours).
Apologies in advance for the tone of my question, I can't find a way to not make it sound slightly aggressive (but I assure you in comes from genuine curiosity and not animosity): do you feel your AI usage that week was worth $1440? (Not necessarily in economic terms/things sold to customers, it could also be personal enjoyment or anything else really.)
Yes. Agentic coding has reinvigorated my love for software development. I really enjoy developing this way.
It's hard to imagine having the personal budget to actually spend that much money, but ever since I crossed some certain threshold with these tools, I have been doing things like "working on side projects outside of work" again. If I had to spend $1440, I'd be trying to figure out if I could run a local model and get the same results.
I'd buy that if Anthropic would make a more technical change here. If it's about cache utilisation, they could change the usage rules and limits to depend on the cache utilisation. It's not hard for a billion dollar company. Sacrifice a little bit of telemetry and save a lot of developer goodwill. But this is a different choice.
Yeah, it's definitely a business decision to have their "all-ish you can eat" plan work only with their own products. I do think that their justification that there are usage patterns designed into that plan which they can't predict with third-party applications likely has an element of truth to it though.
Perhaps, but I'm pretty sure that non cached tokens "cost" more of your 5hourly/weekly usage limit. Non scientific, but when I come back to my desk with many Claude code instances open that have "cache expired" the usage chart really jumps up.
I'm sitting here chuckling at the idea that Anthropic don't like people violating their ToS, while the AI industry as a whole cheerfully ignores terms of service, reasonable behaviour, and the law while training their models on copyrighted books, open source code, Websites that prohibit scraping in their own ToS, etc.
From a business perspective it makes sense they are busting this loophole. Claude Code subscription is a subscription to its "harness" not to the token usage. Anthropic benefits from data collection and a user base.
For me the most crazy part is that, even though Anthropic is making a billion in revenue from Claude Code, they might still be losing money if you consider how much it costs to run Open Code with a pay-as-you-go approach. Maybe it's a small peak into how much these tools will cost once the profitability switch needs to be on.
I don't think Anthropic lose money overall on those $200/month subscriptions. There might be a few edge-case users who manage to eke more than $200/month worth of raw inference costs out of them but the 5 hour rate limits they have in place look to me like they're designed to reign in the amount of damage even those users can cause.
If this is to be believed, why would using other AI harnesses on a pay-per-token subscription yield such monetary differences? Also, why wouldn't they take a loss here? They're competing with OpenAI, Google, Amazon, Anysphere to capture this market.
why would using other AI harnesses on a pay-per-token subscription yield such monetary differences?
Every time you compact you start uncached. Pi’s branching feature is uncached, so are many versions of sub agents. You can see this somewhat by looking at the session report of different agents and how many uncached tokens they need.
Many agents had little incentive to optimize caching. OpenCode only landed it for Anthropic a few months ago for instance.
There's taking a loss, and then there's selling dollars for 80 cents - incentivizing your customers to use you more because you are giving them money from your pocket isn't a great growth hack!
If you're going to literally give money away you need to have some kind of an artificial cap or a single customer could bankrupt you.
incentivizing your customers to use you more because you are giving them money from your pocket isn't a great growth hack!
Simon you're a smart guy and I feel I might be missing the plot when I explain this to you. But in what world this isn't a great model? This is the most SF tech playbook model used to conquer a market. Just to name Uber as an example of company that took years of losses before it became profitable. Why wouldn't Anthropic that raised a Series E be taking losses competing with the infinite money of Google and others?
Obviously they are taking overall losses to gain market share in a competitive and hyped new market like LLMs. Training better models than anyone else is absurdly expensive!
I don't think they have designed the revenue-generating part of their business with unit economics such that being successful at gaining paid customers causes their runway to run out faster than if they gained popularity at a slower rate.
Uber were operating a two-sided marketplace, where they needed to attract drivers in order to attract customers. This made it worthwhile for them to spend a few years losing money on every ride because they was an investment in growing that marketplace.
(Apparently Uber lost money per ride on average in new markets but claimed to be "contribution margin profitable to the tune of 8–9%" in the largest cities.)
Anthropic don't need to subsidize payments to their GPUs in order to convince them to stick around.
And hey, maybe I'm wrong. I think that if any of the credible major labs had designed their paid pricing such that their unit economics were net-negative that would be a very notable story in its own right.
To further support my intuition that Anthropic are unlikely to be losing money on a per-token basis: their API pricing is significantly higher than OpenAI and Gemini, to the point that it's a commercial disadvantage for them in terms of signing up those all-important API customers. Opus 4.5 is around 2x the price of GPT 5.2!
Another hint as to the unit economics of the industry over all is that OpenAI managed to 80% reduce the price of their o3 model last year with no reduction in quality thanks to "engineers optimizing inferencing". I saw this as yet more evidence that pricing is at tied to fundamental costs, not exclusively market grabbing strategies.
Anthropic is comparitively little known outside tech circles compared to OpenAI, Google and so-on. Their long-term survival as a (large) business likely depends on them being known as a market leader with Claude Code and leveraging their first-mover advantage as much as possible.
The business goal of the subscriptions is to grow userbase, increase reputation, brand recognition, etc - ensure people want to give their money to Anthropic well into the future. Other harnesses treat LLM providers as switchable commodities - the exact opposite to this goal.
On the other hand, I can see their point of view because people were costing them a lot of money! So are the people who are Discounters worth the amount of money that they're charging for the AI burning the environment?
Isn't it a normal part of OAuth that a token can be meant for a particular "audience" to use on the user's behalf, and that if some other service is given that same token it should reject it? It's up to each service handling the token to enforce, true. The token belongs to the user, true. I'm just not surprised the token issuer expects the token to be used for that sole purpose.
Yes, the OAuth token is bound to a client ID (The audience is who the token is sent to, i.e. the Claude API).
A traditional server side web app has a client secret used to request tokens and can keep the tokens server side, so the client can prevent others from using its tokens.
Claude code and other native apps are “public clients”. Public clients have a client ID but don’t get a secret. For the device flow like Clause uses you redirect to a localhost server or just give the user a code to enter. You can’t really prevent someone from impersonating the client but the client ID is still only meant for one app to use.