GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens
GLM-5.2 leads open weights at 51, but burns 43k tokens a task. Plus Epic open-sources Lore VCS and a code-graph MCP cuts agent reads 10x.
GLM-5.2 is the top open-weights model in the world this morning, and it bought that crown one expensive token at a time.
It's Thursday, June 18, 2026. Here's the rundown.
A new open-weights leader with an invoice problem, the DOJ deciding xAI's turbines are a national priority, and a code-graph MCP claiming ten-x fewer tokens. Then The Call.
Z.ai's GLM-5.2 scored fifty-one on the Artificial Analysis Intelligence Index, eleven points above GLM-5.1 at the same size. Seven hundred fifty-three billion total parameters, forty billion active, shipped under MIT.
It clears MiniMax-M3 and DeepSeek V4 Pro, both at forty-four, and sits second on the WebDev arena behind only Claude Fable 5.
And before anyone screenshots the leaderboard, look at how it got there. Same parameter class as 5.1. They didn't build a bigger model. They told it to think longer.
The numbers back you up. GLM-5.2 burns forty-three thousand output tokens per task, thirty-seven thousand of it reasoning. GLM-5.1 spent twenty-six thousand.
That's the whole eleven-point jump. Test-time compute is the cheap way to climb a benchmark and the expensive way to run a product.
Run the bill. Pricing holds at GLM-5.1 levels, four dollars forty per million output tokens. At forty-three thousand tokens a task, that's about nineteen cents per task in output alone.
A model scoring forty-four at twenty-four thousand tokens answers for under half that. So the rank and the invoice point in opposite directions.
So when do you reach for it?
When hard reasoning is the constraint and volume is modest. Pick it for the answer you can't get cheaper. Don't pick it for a loop running millions of times a day.
At that scale you cap the reasoning budget or you stay on a leaner open model, because the marginal correct answer is not worth eighty percent more tokens.
Fifty-one is fourth overall. Claude Fable 5 is at sixty, Opus 4.8 at fifty-six, GPT-5.5 at high reasoning at fifty-five.
And that's the honest read. The open top is now one long reasoning trace from the closed frontier, not a generation behind. The gap is compute you pay for on every call.
So the thing to watch isn't the score.
It's token efficiency. A fifty-one that costs forty-three thousand tokens is a worse deal for most serving stacks than a forty-seven at twenty-five thousand.
Whoever ships the first open model that holds this intelligence while halving the reasoning spend wins production. The screenshot is free. The serving is not.
The Justice Department is now framing xAI's unpermitted gas turbines as a matter of national, economic, and energy security.
Which is the legal cover for keeping them running. The federal government would rather fight the air-quality case than slow the compute.
The subtext for every operator is the same. Training and inference now lean on behind-the-meter generation faster than permitting can move.
Power is the binding constraint, not silicon, and the rules are bending around that fact in real time. If you're planning capacity for next year, plan for generation you control, because the grid interconnect queue won't save you.
Separately, Washington declined to blacklist DeepSeek even while flagging more than a hundred firms as security risks.
So DeepSeek weights stay legally usable in the US for now. But the model is one policy memo from off-limits. Don't architect a stack around a single Chinese provider you can't swap out in a week.
There's a new MCP server called codebase-memory-mcp. It indexes a repo into a persistent SQLite graph of functions, calls, and routes, so an agent traverses the graph instead of reading files.
The landing page says ninety-nine percent fewer tokens, a hundred-twenty-x best case. The preprint is more honest. Ten-x, with eighty-three percent answer quality across thirty-one repos.
It full-indexes the Linux kernel, twenty-eight million lines, in three minutes and answers in under a millisecond.
That part's real, and it matters. Move structural lookups off file reads and your context window stops getting eaten by grep. The catch is type resolution covers about a dozen languages well, the rest fall back to text.
Also making the rounds again, a gist arguing you should stop using JWTs for sessions.
Old argument, fresh casualties. Tokens you can't revoke, an algorithm field that's a footgun. If you reached for JWTs because a tutorial said so, default to server-side sessions instead.
And the HTTP QUERY method is now RFC ten-thousand-eight. A safe, idempotent way to send a body with query semantics.
It fixes the GET-with-body mess. Framework support trickles in over the next year. Useful the moment your search API outgrows the URL.
A WordPress VIP survey of two thousand people found sixty percent of US consumers are put off by brands touting AI, and eighty-six percent still want original sources.
Here's the line buried in the same report. One product lifted sales of its top tiers just by removing the word AI and keeping the feature.
So the advice writes itself.
Ship the capability, drop the label. And pair it with the other number today, only sixteen percent of Americans expect AI to help society. That's one trust signal, not two.
Charity Majors made the engineering case in the same beat. Agents amplify whatever rigor a team already has.
Which means sloppy testing and weak observability get worse once code generation speeds up. Invest in review, CI, and rollback before you scale agent throughput. Velocity without guardrails is just faster incidents.
Epic open-sourced Lore under MIT, a content-addressable version control system in Rust built for code plus multi-gigabyte binary assets.
This is the tech that ran inside Unreal. Merkle-chained revisions, content dedup, chunked uploads so editing a few kilobytes of a huge file re-uploads only that. It's aimed straight at Perforce's per-seat cost.
Caveat, it's pre-1.0, and the desktop client most people touch ships as a closed binary.
So open-source the engine, keep the client proprietary. Worth tracking for small studios, not worth migrating onto this week.
And Wolfram Language hit version fifteen with an AI assistant folded into the language itself.
The pairing is the interesting part. A symbolic engine that can check the model's arithmetic. That's exactly the gap most agent stacks paper over.
Adam, a YC W25 company, open-sourced CADAM, an AI system for parametric CAD generation.
Hard domain. Wrong geometry is unforgiving and the training data is scarce. If they make it useful, the moat is real, because nobody else has the data.
And Lago, the open-source metering layer under usage-based pricing. Consumption tracking, subscriptions, revenue analytics.
This is the plumbing every AI product needs once it bills per token instead of per seat. With reasoning spend swinging the way GLM-5.2 just showed, accurate metering is the difference between margin and a surprise.
GrapheneOS has been ported to Android 17, with official releases close. Meanwhile, Volkswagen's app now refuses to run on GrapheneOS.
That's integrity attestation locking out hardened devices. If you lean on Play Integrity, know you're quietly excluding privacy-conscious users. The blocklist is a product decision, not a security necessity.
And a cautionary one, a service holding user images behind a surprise five-dollar paywall.
Own your storage or assume the host monetizes your lock-in. Budget for egress and an exit before you wire anything to a free tier.
Quick break — two from the desk.
One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.
And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.
MicroUI is a tiny immediate-mode UI library in ANSI C, zero dependencies.
Eclipse Zenoh unifies pub-sub, geo-distributed storage, and queries in one stack.
Stop Killing Games failed to secure EU law despite one-point-three million signatures.
A French physicist and media star lost his doctorate after a plagiarism probe.
And Bubbles.town is a Hacker News built for independent blogs.
Our call: an open-weights model clears fifty-five on the Intelligence Index, matching GPT-5.5 at high reasoning, by September thirtieth, erasing the gap at the very top.
Proven wrong if no open model hits fifty-five on v4.1 or its successor by then. The gap from fifty-one to fifty-five is a longer reasoning trace, not a bigger model. Settles September thirtieth.
Z.ai's GLM-5.2 is now the top open-weights model on the Artificial Analysis Intelligence Index, scoring 51 on v4.1. That is 11 points above GLM-5.1 at the same size, 753B total parameters with 40B active, shipped under MIT. It clears MiniMax-M3 (44) and DeepSeek V4 Pro (44), and it sits second on the Code Arena WebDev leaderboard behind only Claude Fable 5. The gains are concentrated in reasoning: CritPt scientific reasoning jumped to 21%, Humanity's Last Exam to 40%, AA-LCR to 71%.
The move underneath is plain. This is a same-size MoE refresh, not a bigger model. The 11-point jump comes mostly from spending far more at inference time. GLM-5.2 burns 43k output tokens per Intelligence Index task, 37k of it reasoning, up from 26k on GLM-5.1 and well above MiniMax-M3 (24k) and Kimi K2.6 (35k). Same weights class, more thinking, higher score. The headline buys you intelligence by buying you tokens.
Run the cost. First-party pricing holds at GLM-5.1 levels: $1.4/M input, $4.4/M output, $0.26/M cache hits. At 43k output tokens a task, that is roughly $0.19 per task in output alone before you count input. A model scoring 44 at 24k output tokens answers for under half that. If you are running an agent loop at scale, the leaderboard rank and the invoice point in opposite directions. Pick GLM-5.2 when answer quality on hard reasoning is the constraint and volume is modest. Stay on a leaner open model, or cap its reasoning budget, when you are pushing millions of tasks a day and the marginal correct answer is not worth 80% more tokens.
The ranking context matters too. Fifty-one is fourth overall, behind Claude Fable 5 (60), Claude Opus 4.8 (56), and GPT-5.5 at xhigh reasoning (55). The open-weights top is now a long reasoning trace away from the closed frontier, and closing it costs compute you pay for on every call. That is the trade the whole field is making. Test-time compute is the cheap way to climb a benchmark and the expensive way to run a product.
For the next two quarters, watch token efficiency become the number that actually matters in open weights. A score of 51 that costs 43k tokens is a worse deal for most serving stacks than a 47 at 25k. Whoever ships the first open model that holds GLM-5.2's intelligence while halving its reasoning spend wins the production market, not the leaderboard screenshot.
DOJ calls xAI's unpermitted gas turbines an energy security matter
The Justice Department is framing xAI's unpermitted gas turbines as a question of national, economic, and energy security, which is the legal cover for keeping them running. The subtext for every operator: frontier training and inference now lean on behind-the-meter generation faster than permitting can move, and the federal government would rather fight the air-quality fight than slow the compute. Power, not silicon, is the binding constraint, and the rules are bending around it.
US holds off blacklisting DeepSeek as 100+ firms get flagged
Washington declined to add DeepSeek to its export blacklist even as it deemed more than 100 firms security risks. For builders, the read is that DeepSeek weights stay legally usable in the US for now, but the model sits one policy memo from being off limits. Do not architect a production stack around a single Chinese open-weights provider you cannot swap out in a week.
Swapping a homelab to a Broadcom SFP+ module for 10Gb Ethernet
A practical writeup on moving a 10Gb/s link to a Broadcom SFP+ transceiver, the kind of detail that decides whether your local model-serving box actually saturates the network. Worth a skim if you are wiring a small inference cluster and tired of mystery link drops on cheap optics.
codebase-memory-mcp turns a repo into a graph agents query instead of grep
This MCP server indexes a codebase into a persistent SQLite knowledge graph of functions, calls, routes, and cross-service links, so an agent answers structural questions by traversing the graph rather than reading files. The marketing says 99% fewer tokens, the landing page shows a 120x best case (412k tokens down to 3.4k across five questions), and the preprint reports a more honest 10x with 83% answer quality across 31 repos. It full-indexes the Linux kernel, 28M LOC, in three minutes and answers in under 1ms. Tree-sitter covers 158 languages, but type-resolving Hybrid LSP only covers about a dozen, the rest fall back to text. Single static binary, solo maintainer, 3,757 stars.
Stop Using JWTs makes the rounds again
A gist arguing JWTs are the wrong default for sessions hit 344 points: stateless tokens you cannot revoke, footgun algorithm fields, and storage you cannot invalidate. If you are reaching for JWTs because a tutorial said so, read this first, then default to server-side sessions unless you have a real cross-service reason. The argument is old, the mistakes are not.
RFC 10008 standardizes the HTTP QUERY method
The HTTP QUERY method is now an RFC: a safe, idempotent way to send a request body with query semantics, fixing the long-standing mess of GET-with-body and overloaded POST for search endpoints. Expect framework and client support to trickle in over the next year. Useful the moment your search API outgrows what fits in a URL.
Continue ships its open-source coding agent
The Continue project is back in the feed as an open-source coding agent, the self-hosted alternative for teams that will not pipe their codebase through a vendor. Pair it with a graph-index MCP like the one above and you get an agent that reasons about structure without burning a context window on file reads.
HTTP requests with no curl, using Bash /dev/TCP
A reminder that Bash can open raw TCP sockets through /dev/tcp, letting you make HTTP requests in a container that ships no curl or wget. Handy for minimal images and locked-down CI where adding a binary is a fight. Not a tool, a trick worth filing away.
60% of US consumers say the word AI in messaging is a turnoff
A WordPress VIP survey of 2,000 people found 60% of US consumers are put off by brands touting AI, and 86% still want original sources. The builder twist buried in the same report: 60% of enterprises saw traffic from AI answer engines rise, and one product lifted sales of its top tiers by removing the word AI while keeping the feature. Ship the capability, drop the label.
Tim Ferriss says AI is gutting how-to nonfiction sales
Self-help units fell 26.3% year over year in Q1 2026, and Ferriss reports his own catalog down about 45% in the second half of 2025, blaming LLMs that substitute for lookup-table books. It is one author plus aggregate BookScan data, and he names confounders like Amazon stocking and post-TikTok reversion. Still a real signal for anyone whose product is packaged prescriptive knowledge: the chatbot is the new substitute good.
GPT-NL pitches a sovereign Dutch language model
TNO's GPT-NL is a state-backed model trained for the Netherlands, the latest in a run of sovereign-AI projects betting that data residency and language coverage beat raw capability for public-sector buyers. The economics are the question: a national model competing against frontier APIs needs a captive market to justify the compute. Watch whether it ships weights or stays a procurement story.
Only 16% of Americans expect AI to help society
A new study puts public optimism about AI at 16%, the kind of number that shapes regulation and enterprise risk appetite well before it shapes model releases. If you sell to consumers, this is the trust gap you are pricing against. Pair it with the AI-as-turnoff survey and the message is one signal, not two.
The case that AI demands more engineering discipline, not less
Charity Majors argues that agents amplify whatever rigor your team already has, so sloppy testing and weak observability get worse, not better, once code generation speeds up. The practical takeaway: invest in review, CI, and rollback before you scale agent throughput. Velocity without guardrails is just faster incidents.
Epic open-sources Lore, a Rust VCS aimed at Perforce
Epic released Lore under MIT, a content-addressable version control system in Rust built for code plus multi-GB binary assets, with Merkle-chained revisions, content-level dedup, and chunked uploads so editing a few KB of a huge file re-uploads only that. It is the same tech that ran inside Unreal as Unreal Revision Control, now opened to remove Perforce's per-seat cost for smaller studios. Caveat: it is pre-1.0, and the desktop client most users will touch ships as a closed binary with proprietary dependencies.
Wolfram Language and Mathematica hit version 15 with built-in AI
Version 15 folds an AI assistant directly into the Wolfram Language alongside symbolic music and new core functions. The interesting bit for builders is the pairing: a symbolic engine that can check the LLM's arithmetic, which is exactly the gap most agent stacks paper over. Worth a look if you do anything computational where wrong answers are expensive.
OpenMontage turns a coding agent into a video studio
An open-source agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills, wiring your coding assistant into rendering and editing work. Early and ambitious, but it is a clean example of the MCP-tool pattern spreading well past code into media production. Useful to study even if you never cut a video.
Adam (YC W25) launches open-source AI CAD
Adam open-sourced CADAM, an AI system for parametric CAD generation, going after a domain where wrong geometry is unforgiving and the training data is scarce. The bet is that generative CAD becomes useful enough to anchor a paid layer on top of open weights. Hard problem, real moat if it works.
Lago is the open-source metering layer under usage-based pricing
Lago handles consumption tracking, subscriptions, and revenue analytics, the plumbing every AI product needs once it bills per token or per call instead of per seat. As model costs swing with reasoning spend, accurate metering is the difference between margin and a surprise. The open-source option matters when your usage data is the business.
GrapheneOS ports to Android 17 with releases coming
GrapheneOS has been ported to Android 17 and official releases are near, keeping the hardened mobile OS current with the base platform. For anyone shipping apps to privacy-strict users, this is the build to test against. The fast port also signals a healthier upstream relationship than the project's recent friction suggested.
Volkswagen starts blocking GrapheneOS users from its app
VW's app now refuses to run on GrapheneOS, the latest case of integrity-attestation APIs locking out hardened or rooted devices. If you build mobile apps and lean on Play Integrity, know that you are quietly excluding a privacy-conscious slice of users. The blocklist is a product decision, not a security necessity.
Want your images back? That will be $5
A writeup on a service holding user images behind a surprise paywall, a clean cautionary tale about depending on third-party media hosts with no export path. The lesson is old and keeps biting: own your storage or assume the host will monetize your lock-in. Budget for egress and exit before you wire anything to a free tier.
Token economics is the real spec now. GLM-5.2 climbed 11 points by spending 43k output tokens a task, while a graph-index MCP cuts agent reads by 10x or more. If you run agents in production this week, instrument cost per correct answer, not benchmark rank, then cap reasoning budgets and move structural lookups to a code graph before you scale throughput.
An open-weights model clears 55 on the Artificial Analysis Intelligence Index, matching GPT-5.5 at xhigh reasoning, by September 30, 2026, erasing the closed-versus-open gap at the very top.
GLM-5.2 jumped 11 points at the same parameter size in one release by spending more test-time compute, and the open-weights field is shipping MoE refreshes on a fast cadence. Consensus still assumes open models trail the frontier by roughly six months, but the gap from 51 to 55 is a longer reasoning trace, not a bigger model.
No open-weights model scores 55 or higher on the Artificial Analysis Intelligence Index v4.1 (or its successor) by September 30, 2026.