On Independence Day, the story is a benchmark: frontier AI running cheap on hardware that isn't Nvidia's
On the Fourth of July, the most on-theme story on a half-size wire is a benchmark: GLM 5.2 running competitively on AMD, with performance per dollar still falling. A reflective close to a week that made compute independence, freedom from any single model or chip vendor, concrete for builders.
It's the Fourth of July, and the most on-theme story on the wire is a benchmark.
Independence Day, a half-size wire. Here's the rundown: how a cost-per-token number on AMD closes out the week, and what compute independence actually means for what you build.
A post going around this weekend runs GLM 5.2, the open Chinese model that anchored the whole week, on AMD hardware. It measures the one unit that decides where inference lives: performance per dollar. And it keeps getting cheaper.
The holiday is about independence. The benchmark is about a narrower kind of it: freedom from any single company's chips, and any single company's model.
Read it as the receipt for the week. Nine days ago an open model caught Claude. Since then the price of capable AI fell from every side. Sonnet 5 cut it in software. Etched raised five billion to cut it in silicon. Kimi and GLM walked into the tools developers use. A movement formed around running the models yourself.
The AMD result closes the loop. Frontier-class inference now runs cheaply on hardware that isn't Nvidia's, which was the piece the whole thesis was missing.
Independence is the honest word, and it's worth saying which kind. Not independence from AI. The week was the opposite, AI getting common enough to run anywhere. This is independence from lock-in.
And it isn't free. Self-hosting trades a monthly invoice for the work of running the thing yourself. What you buy with it is the standing ability to move between models and between chips without asking anyone's permission.
And the open frontier is multipolar now. China's GLM and Kimi. Europe's Mistral shipped Leanstral 1.5 this weekend. The US labs' own cheap tiers. No single vendor sits astride all of it.
For anyone building, take this as the week's real instruction. Assume any model or chip you depend on can be throttled, repriced, restricted, or discontinued, because a version of each happened this month.
Design for exit. Put a portability layer between your code and any one vendor. Keep a second model configured. Know what it takes to move your inference onto hardware you own.
This weekend's AMD benchmark is the proof that the last step on that list costs less than it did on Monday.
To the tape. We moved AMD to a long on this, up from yesterday's watch. Open models running at competitive cost per token on AMD is the independence trade, the one hardware name that gains as inference decouples from Nvidia.
We're watching Nvidia, low conviction. Training is theirs to lose and they won't lose it soon, but inference is the exposed flank. And Mistral on watch, keeping Europe in the open-model game so no single government can gate the supply.
The tape is the desk's scorecard, not advice.
Quick break — two from the desk.
One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.
And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.
Our call: within nine months, at least one major cloud or lab publicly reports serving a top open model on AMD at a lower cost per token than the equivalent Nvidia setup, and AMD’s inference share visibly rises on the back of it.
What proves us wrong: if by April fourth next year no cloud or lab has reported that AMD cost win, and AMD's inference share hasn't moved.
On a holiday about independence, the most useful kind this week is the plain ability to move. Keep a second model wired in. That's the rundown, and that's the week.
It is the Fourth of July, the wire is half its usual size, and the most on-theme story on it is a benchmark. A post going around this weekend runs GLM 5.2, the open Chinese model that has anchored this whole week, on AMD hardware, and measures the result in the one unit that finally decides these things: performance per dollar, and it keeps getting cheaper. The holiday is about independence. The benchmark is about a narrower kind of it: freedom from any single company's chips, and any single company's model.
Read the weekend number as the receipt for the week. Nine days ago an open model caught Claude on a security eval. Since then the price of capable AI fell from every side at once. Sonnet 5 cut it in software, Etched raised $5 billion to cut it in silicon, Kimi and GLM walked into the tools developers already use, and a movement formed around running the models yourself. The AMD result closes the loop: frontier-class inference now runs cheaply on hardware that is not Nvidia's, which is the piece the whole thesis was missing.
Independence is the honest word for what changed this week, and it is worth saying which kind. Not independence from AI; the week was the opposite of that, AI getting common enough to run anywhere. Independence from lock-in: the standing ability to move between models and between chips without asking anyone's permission. It carries a cost, and self-hosting means trading a monthly invoice for the work of running the thing. What it buys is optionality, and the open frontier now spans three continents that can supply it, with China's GLM and Kimi, Europe's Mistral shipping Leanstral 1.5 this weekend, and the US labs' own cheap tiers. No single vendor sits astride all of it.
For anyone building, take this as the week's real instruction, holiday or not. Assume any single model or chip you depend on can be throttled, repriced, restricted, or discontinued, because a version of each happened this month: Anthropic rationed capacity, Commerce froze and then freed Fable 5, an export line moved and moved back. Design for exit. Put a portability layer between your code and any one vendor, keep a second model configured, and write down today what it would take to move your inference onto hardware you own. This weekend's AMD benchmark is the proof that the last step on that list costs less than it did on Monday.
GLM 5.2 on AMD: the cost per token keeps falling
The weekend's benchmark runs the open GLM 5.2 on AMD hardware and reports steadily improving performance per dollar, the metric that decides where inference actually lives. It is the quiet capstone to the week: capable models were already cheap and portable, and now the silicon under them does not have to be Nvidia's. Verify it against your own workload before you re-plan a cluster, but the direction is not subtle.
Mistral ships Leanstral 1.5, and the open frontier stays multipolar
Europe's Mistral released Leanstral 1.5, a compact open model aimed at formal proofs, under the banner of proof abundance for all. On its own it is a niche release. In the week's context it is the third continent heard from: alongside China's GLM and Kimi and the US labs' cheap tiers, the open supply of capable models now comes from everywhere, which is exactly what makes it hard for any one government or vendor to gate.
Yesterday: running the models yourself became a cause
A "Right to Local Intelligence" manifesto trended beside working guides to run frontier models locally, with privacy law tightening underneath it. Today's AMD number is the hardware half of that argument: local intelligence needs affordable silicon to run on, and the affordable silicon just got more capable.
The week that started it: an open model caught Claude
Nine days ago Semgrep's cyber eval put Zhipu's open GLM 5.2 level with Claude at a fraction of the cost, and we conceded a missed short on the tape. Every edition since has been one story told in installments: capability stopped being scarce, and the whole stack, models, tools, and now chips, reorganized around that fact.
On a half-size holiday wire, the most fitting story is a cost-per-token benchmark: GLM 5.2 running competitively on AMD, with performance per dollar still falling. It is the receipt for the whole week. Nine days ago an open model caught Claude; since then the price of capable AI dropped in software with Sonnet 5, in silicon with Etched, and in distribution as open models walked into Copilot, while a movement formed around running them yourself. Today's number adds the last piece: the hardware underneath does not have to be Nvidia's either. The honest word for all of it is independence, the narrow kind that means freedom from lock-in, not from AI. It has a cost, and most teams will keep calling an API because it is easier. But the option to move between models and between chips is real now in a way it was not a month ago. For builders, the instruction is plain: design for exit, keep a second model wired in, and know what moving your inference off any one vendor would take. The last step got cheaper this weekend.
Within nine months, at least one major cloud or AI lab publicly reports serving a top open model on AMD hardware at a lower cost per token than the equivalent Nvidia deployment, and AMD's share of AI inference visibly rises on the back of it.
This weekend's benchmark shows GLM 5.2 hitting competitive performance per dollar on AMD, and the whole month pushed inference toward cost-sensitive, portable, open models. When the model is open and the buyer optimizes cost per token, the software lock-in that protects Nvidia most on training counts for least. Inference is the larger market and the one now in motion, so a public cost win on AMD is the kind of proof point that moves procurement.
If, by April 4, 2027, no major cloud or lab has publicly reported serving a top open model on AMD at a lower cost per token than the equivalent Nvidia deployment, and AMD's inference share has not visibly moved, the call is wrong.
We put AMD on watch yesterday on the on-prem thesis; this weekend's GLM-on-AMD benchmark is the confirming data, so we move to long. Open frontier-class models running at competitive cost per token on AMD is the independence trade: the one hardware name that gains as inference decouples from Nvidia's stack.
Nvidia's strongest lock-in is on training, where software maturity matters most. Inference on open models is the opposite case: the buyer optimizes cost per token and cares less about the ecosystem, which is the ground AMD can take, and it is the larger long-run market.
Training stays Nvidia's to lose, and it will not lose it soon. Inference is the exposed flank: a credible AMD cost-per-token result on open models, on top of the inference-ASIC wave, is the first serious pressure on the part of the business the valuation leans on hardest.
The bull case is that inference volume lifts all accelerator demand. The offset is that open-model inference is where buyers shop purely on price, and that is precisely where AMD and fixed-function silicon are aiming.
Leanstral 1.5 keeps Europe in the open-model game and the open frontier multipolar. The monetization worry is the same one that dogs every open-weights lab, but the strategic value is real: open supply from a third continent is that much harder for any one government to gate.
Mistral's releases matter less as revenue than as insurance for the whole open ecosystem: if the open supply of capable models comes from China, the US, and Europe at once, no single export regime or vendor can choke it.