Anthropic's cheap new model runs agents like the flagship did, and the price of capable AI fell again
Anthropic shipped Claude Sonnet 5, flagship-grade agentic coding and tool use at mid-tier pricing with a 1M context window, the same week Etched hit a $5B valuation on $1B of booked inference orders. Why the price of capable AI is collapsing from both ends, software and silicon.
Anthropic just cut the price of a capable agent this week, and it did it to its own flagship.
It's Wednesday, July first. Here's the rundown: what Sonnet 5 actually changes, the silicon cutting costs from the other side, and where the money goes once the model stops being scarce.
Sonnet 5 shipped, and the tell is what Anthropic took out. It drops the temperature, top-p, and top-k sampling knobs. How the model decodes is fixed now, not a dial you turn.
Around that quiet change is a loud one. Sonnet 5 runs the agentic coding and tool use that needed the flagship six months ago, at mid-tier pricing, with a million-token window. Every team running agents just watched that work get cheaper without touching their code.
And it's the whole week landing on the invoice. Nine days ago an open Chinese model caught Claude at a tenth of the cost. Yesterday Anthropic pulled its flagship back from an export freeze.
Today it undercuts that same flagship with a model most teams will find good enough, and Google cut too, shipped its cheapest image model the same morning. Capable AI is getting cheaper from every direction, and the labs are doing the cutting to themselves before an open model does it for them.
There's a hardware half to this. Etched, a startup whose chip runs transformers and nothing else, raised at five billion dollars on a billion in booked inference orders.
Which is a gamble that the architecture holds still long enough to bake into silicon. Grant the risk. Then look at the billion already booked. Enough serious buyers think inference has stopped changing shape that they're prepaying for chips that assume it.
So what does Anthropic do about all this? Look at what it shipped next to Sonnet 5.
Claude Science. A standalone product for computational biology and drug discovery, sold as a tool for a job, not raw model access. That's a different business than renting the smartest tokens by the million.
As the price of a token slides toward the open-model floor, a finished result a lab will pay for is the ground that keeps your margin off that floor.
For anyone building: re-price your agent traffic against Sonnet 5 before you assume you need the flagship. On most coding and tool-use work, the cheaper model clears the bar now, and the flagship only earns its premium on the genuinely hard reasoning.
To the tape. We're watching Nvidia, Etched, Alphabet, and Anthropic. The inference-ASIC wave got its first hard number this week, and it's the first real pressure on the part of Nvidia's business that isn't training.
Highest conviction is a long on Anthropic, of all things. Sonnet 5, Claude Science, and Fable 5's return read as one strategy: cut the token price, move upmarket into products. The falsifier is if that price cut just compresses their margin and the outcome products don't pick up the slack.
The tape is the desk's scorecard, not advice.
Quick break — two from the desk.
One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.
And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.
Our call: within four months, at least one widely used agent framework or coding tool moves its default model down from a flagship to a mid-tier model in the Sonnet 5 class, because the cheaper one is now good enough for most of the work.
What proves us wrong: if by November first no major agent tool has changed its default, and they're all still pointing new users at the flagship. These defaults are public, so it settles cleanly.
A team spending four figures a month on flagship agent calls can move most of that traffic to Sonnet 5 this week and read the savings on next month's bill. That's the rundown.
The most telling thing in the Sonnet 5 release notes is a subtraction. Anthropic's new mid-tier model drops the temperature, top-p, and top-k sampling controls that every prior version exposed; the decoding is fixed now, not a knob you turn. Around that quiet change sits a loud one. Sonnet 5 runs the agentic coding and multi-step tool use that needed the flagship six months ago, at mid-tier pricing, inside a million-token context window. Every team running agents in production just had the cost of that work cut without touching a line of their own code.
This is the whole week's arc landing on the invoice. Nine days ago an open Chinese model caught Claude on a working security eval at a tenth of the cost. Yesterday Anthropic pulled its flagship back from an export freeze and reopened the capacity builders had lost. Today it undercuts that same flagship with a model most teams will find good enough, and Google cut alongside it, shipping Nano Banana 2 Lite as its cheapest image model the same morning. Capable AI is getting cheaper from every direction at once, and the labs are now doing the cutting to themselves before an open model does it for them.
The same compression is running one layer down, in the silicon. Etched, a startup whose chip does nothing but run transformer models, reached a $5 billion valuation this week on $1 billion of booked inference orders. Grant the risk honestly: a fixed-function chip presumes the architecture holds still long enough to bake into hardware, and if the shape of the models moves, that silicon turns into scrap. Now weigh the $1 billion already on the books against it. Enough serious buyers are convinced inference has stopped changing shape that they are prepaying for chips that assume it.
Stack the two together and Anthropic's other moves this week read like a company that has already run this math. It shipped Claude Science next to Sonnet 5, a standalone product for computational biology and drug discovery, sold as a tool aimed at a job rather than as raw access to a model. That is a different business than renting the smartest tokens by the million. As the price of a capable token slides toward the open-model floor, a finished result a research lab will pay for is the ground that keeps a lab's margin off that floor.
For anyone building on this, the practical read is short. Re-price your agent workloads against Sonnet 5 before you assume the flagship is required; on most tool-use and coding jobs the cheaper model now clears the bar, and the flagship earns its premium only on the genuinely hard reasoning. Keep the vendor fallback you wired in during last week's crunch, because the same forces cutting prices are the ones that make any single model swappable. And watch where the labs aim their new products, because Claude Science is a tell about where they think the money sits once the model itself is no longer scarce. A team spending four figures a month on flagship agent calls can move most of that traffic to Sonnet 5 this week and read the savings on next month's bill.
Claude Sonnet 5 lands flagship-grade agents at mid-tier pricing
Anthropic shipped Sonnet 5 as the cheap way to run agents: agentic coding and tool use that recently needed a flagship, now at mid-tier rates and pitched directly against Opus, GPT-5.5, and Gemini Pro for production workloads. The efficiency claims are contested and worth checking against your own traffic. The move that is not in doubt is the pricing: the same class of work costs materially less this week than last.
Sonnet 5 drops the sampling knobs and adds a million-token window
Simon Willison's teardown flags the quiet change: Sonnet 5 removes the temperature, top-p, and top-k controls entirely and ships a 1M context window. Losing the sampling knobs trades a little control for more predictable behavior, the same direction Opus 4.8 took. For agent builders the bigger unlock is the context: a million tokens holds a large codebase or a long tool trace in one call.
Google ships Nano Banana 2 Lite as its cheapest, fastest image model
Gemini 3.1 Flash Lite Image went live in the API tuned for speed and volume, with Arena Elo scores near the full model at a fraction of the cost. Read it next to Sonnet 5 on the same day: the frontier labs are competing on price across every modality now, not just at the top of the benchmark. The low-cost tier is where the volume, and the pressure, is moving.
Etched hits a $5B valuation on $1B of booked inference-chip orders
Etched builds an ASIC that runs transformer models and nothing else, trading flexibility for cost. A billion dollars in booked orders is the real signal here: it is customers prepaying for silicon that assumes the architecture stays fixed. The bull case is that inference has stopped changing shape and fixed-function chips will undercut GPUs on cost per token. The risk is the same sentence read the other way.
Qualcomm's datacenter play: stack the DRAM on top of the accelerator
Qualcomm is pitching near-memory "high-bandwidth compute", burying the compute under the memory to cut the data movement that dominates inference cost, shipping in AI250 Dragonfly racks next year. It is another angle on the same problem Etched is attacking from silicon and Sonnet 5 from software: the cost of serving a token is where the fight has moved. Whether it beats a GPU in practice is a next-year question.
Claude Science: Anthropic starts selling outcomes, not just tokens
Alongside Sonnet 5, Anthropic shipped Claude Science, a standalone product for research and drug discovery that extends the Claude Code model into lab work, available to all paid subscribers. It matters as a business signal more than a product launch: as the price of a raw token falls toward the open-model floor, a finished tool aimed at a specific job is what a lab can still charge a premium for.
Yesterday: Fable 5 came back when Commerce lifted the export freeze
The flagship's return that led our June 30 edition had a cause the release notes soft-pedaled: the Commerce Department lifted export controls on Fable 5 and Mythos 5, and the model flowed back to Claude, AWS, Google Cloud, and Azure. The capacity builders lost was policy, not physics, and policy is exactly the kind of supply that can be pulled again.
Monday: an open Chinese model caught Claude at a tenth of the cost
Semgrep's cyber eval put Zhipu's open GLM 5.2 level with Claude for far less money. Sonnet 5's cheaper agent tier is the direct answer to that pressure. Read the week in order and it is one story: the price of capable AI is being reset downward, and this time the closed labs are doing it to themselves.
The price of a capable agent dropped hard this week, and the labs are the ones swinging the axe. Sonnet 5 delivers flagship-grade agentic coding and tool use at mid-tier pricing with a million-token window, Google shipped its cheapest image model the same day, and Etched raised at five billion on a billion in booked orders for chips that serve tokens cheaper than a GPU. The compression is hitting software and silicon at once, and it points one way: raw capability is on its way to commodity, and every lab can see it coming. Watch what they build around the edges. Claude Science, sold as a finished tool for a real job rather than as model access, is Anthropic telling you where the margin goes once the token itself is cheap. For builders the move is concrete this week: re-price your agent traffic against Sonnet 5, keep the fallback you wired in during the crunch, and put the flagship only on the work that genuinely needs it. Most of it no longer does.
Within four months, at least one widely used agent framework or AI coding tool changes its default or recommended model from a flagship to a mid-tier model in the Sonnet 5 class, on the stated grounds that the cheaper model is now good enough for most of the work.
Sonnet 5 delivers flagship-grade agentic coding and tool use at mid-tier pricing, and the same price compression is hitting every lab at once. Framework and tool maintainers pay the model bill their users complain about, so they move quickly when a cheaper model clears the quality bar. The default model in these tools is a public, dated fact in their docs and configs, which makes the shift easy to verify the moment it happens.
If, by November 1, 2026, no widely used agent framework or AI coding tool has moved its default or recommended model down from a flagship to a mid-tier model, and they still point new users at the flagship by default, the call is wrong.
Google shipping its cheapest image model the same day Anthropic cut its agent tier is the pattern we have flagged all week: the vendor that owns its silicon can push the cost floor down across every modality and absorb a price war others have to fund.
Owning the TPU stack means Google can meet open-model and rival pricing without renting someone else's accelerators, and it already ships an open Gemma line for the price-sensitive end. Nano Banana 2 Lite is that advantage showing up as product.
A billion dollars in booked orders is the strongest evidence yet that fixed-function transformer inference is a real category, not a thesis. We watch it because the same fact carries the whole risk: the chip assumes the architecture holds still.
Prepaid orders at this scale mean serious buyers expect transformer inference to stay stable long enough to amortize custom silicon. If that holds, Etched undercuts GPUs where it counts. If the models move, the advantage evaporates.
Sonnet 5, Claude Science, and Fable 5's return in one week read as a coherent strategy, not three unrelated launches. Anthropic is cutting the price of the token while moving upmarket into finished products a lab pays for, which is exactly the play once weights stop being the moat.
When raw capability commoditizes, a closed lab's durable margin comes from selling outcomes and reliability rather than tokens. Claude Science is the first clear outcome product; Sonnet 5 defends the volume tier against open models underneath it.
The inference-ASIC wave got its first hard number this week: Etched at $5B on $1B booked, plus Qualcomm's near-memory pitch. Training stays Nvidia's; inference is where fixed-function silicon and near-memory designs credibly undercut GPUs on cost per token, and inference is the larger long-run market.
Nvidia's moat is strongest where flexibility matters, which is training. As inference volume dwarfs training and the models stop changing shape, buyers optimizing cost per token have a real reason to route serving onto cheaper fixed-function hardware.