GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens

The Rundown No. 118 · Audio Edition · 10 min All episodes RSS MP3

0:00 / 10:19

VTT

Oday

GLM-5.2 is the top open-weights model in the world this morning, and it bought that crown one expensive token at a time.

Shannon

It's Thursday, June 18, 2026. Here's the rundown.

Shannon

A new open-weights leader with an invoice problem, the DOJ deciding xAI's turbines are a national priority, and a code-graph MCP claiming ten-x fewer tokens. Then The Call.

Oday

Z.ai's GLM-5.2 scored fifty-one on the Artificial Analysis Intelligence Index, eleven points above GLM-5.1 at the same size. Seven hundred fifty-three billion total parameters, forty billion active, shipped under MIT.

Oday

It clears MiniMax-M3 and DeepSeek V4 Pro, both at forty-four, and sits second on the WebDev arena behind only Claude Fable 5.

Shannon

And before anyone screenshots the leaderboard, look at how it got there. Same parameter class as 5.1. They didn't build a bigger model. They told it to think longer.

Oday

The numbers back you up. GLM-5.2 burns forty-three thousand output tokens per task, thirty-seven thousand of it reasoning. GLM-5.1 spent twenty-six thousand.

Shannon

That's the whole eleven-point jump. Test-time compute is the cheap way to climb a benchmark and the expensive way to run a product.

Oday

Run the bill. Pricing holds at GLM-5.1 levels, four dollars forty per million output tokens. At forty-three thousand tokens a task, that's about nineteen cents per task in output alone.

Shannon

A model scoring forty-four at twenty-four thousand tokens answers for under half that. So the rank and the invoice point in opposite directions.

Oday

So when do you reach for it?

Shannon

When hard reasoning is the constraint and volume is modest. Pick it for the answer you can't get cheaper. Don't pick it for a loop running millions of times a day.

Shannon

At that scale you cap the reasoning budget or you stay on a leaner open model, because the marginal correct answer is not worth eighty percent more tokens.

Oday

Fifty-one is fourth overall. Claude Fable 5 is at sixty, Opus 4.8 at fifty-six, GPT-5.5 at high reasoning at fifty-five.

Shannon

And that's the honest read. The open top is now one long reasoning trace from the closed frontier, not a generation behind. The gap is compute you pay for on every call.

Oday

So the thing to watch isn't the score.

Shannon

It's token efficiency. A fifty-one that costs forty-three thousand tokens is a worse deal for most serving stacks than a forty-seven at twenty-five thousand.

Shannon

Whoever ships the first open model that holds this intelligence while halving the reasoning spend wins production. The screenshot is free. The serving is not.

Oday

The Justice Department is now framing xAI's unpermitted gas turbines as a matter of national, economic, and energy security.

Shannon

Which is the legal cover for keeping them running. The federal government would rather fight the air-quality case than slow the compute.

Oday

The subtext for every operator is the same. Training and inference now lean on behind-the-meter generation faster than permitting can move.

Shannon

Power is the binding constraint, not silicon, and the rules are bending around that fact in real time. If you're planning capacity for next year, plan for generation you control, because the grid interconnect queue won't save you.

Oday

Separately, Washington declined to blacklist DeepSeek even while flagging more than a hundred firms as security risks.

Shannon

So DeepSeek weights stay legally usable in the US for now. But the model is one policy memo from off-limits. Don't architect a stack around a single Chinese provider you can't swap out in a week.

Oday

There's a new MCP server called codebase-memory-mcp. It indexes a repo into a persistent SQLite graph of functions, calls, and routes, so an agent traverses the graph instead of reading files.

Shannon

The landing page says ninety-nine percent fewer tokens, a hundred-twenty-x best case. The preprint is more honest. Ten-x, with eighty-three percent answer quality across thirty-one repos.

Oday

It full-indexes the Linux kernel, twenty-eight million lines, in three minutes and answers in under a millisecond.

Shannon

That part's real, and it matters. Move structural lookups off file reads and your context window stops getting eaten by grep. The catch is type resolution covers about a dozen languages well, the rest fall back to text.

Oday

Also making the rounds again, a gist arguing you should stop using JWTs for sessions.

Shannon

Old argument, fresh casualties. Tokens you can't revoke, an algorithm field that's a footgun. If you reached for JWTs because a tutorial said so, default to server-side sessions instead.

Oday

And the HTTP QUERY method is now RFC ten-thousand-eight. A safe, idempotent way to send a body with query semantics.

Shannon

It fixes the GET-with-body mess. Framework support trickles in over the next year. Useful the moment your search API outgrows the URL.

Oday

A WordPress VIP survey of two thousand people found sixty percent of US consumers are put off by brands touting AI, and eighty-six percent still want original sources.

Shannon

Here's the line buried in the same report. One product lifted sales of its top tiers just by removing the word AI and keeping the feature.

Oday

So the advice writes itself.

Shannon

Ship the capability, drop the label. And pair it with the other number today, only sixteen percent of Americans expect AI to help society. That's one trust signal, not two.

Oday

Charity Majors made the engineering case in the same beat. Agents amplify whatever rigor a team already has.

Shannon

Which means sloppy testing and weak observability get worse once code generation speeds up. Invest in review, CI, and rollback before you scale agent throughput. Velocity without guardrails is just faster incidents.

Oday

Epic open-sourced Lore under MIT, a content-addressable version control system in Rust built for code plus multi-gigabyte binary assets.

Shannon

This is the tech that ran inside Unreal. Merkle-chained revisions, content dedup, chunked uploads so editing a few kilobytes of a huge file re-uploads only that. It's aimed straight at Perforce's per-seat cost.

Oday

Caveat, it's pre-1.0, and the desktop client most people touch ships as a closed binary.

Shannon

So open-source the engine, keep the client proprietary. Worth tracking for small studios, not worth migrating onto this week.

Oday

And Wolfram Language hit version fifteen with an AI assistant folded into the language itself.

Shannon

The pairing is the interesting part. A symbolic engine that can check the model's arithmetic. That's exactly the gap most agent stacks paper over.

Oday

Adam, a YC W25 company, open-sourced CADAM, an AI system for parametric CAD generation.

Shannon

Hard domain. Wrong geometry is unforgiving and the training data is scarce. If they make it useful, the moat is real, because nobody else has the data.

Oday

And Lago, the open-source metering layer under usage-based pricing. Consumption tracking, subscriptions, revenue analytics.

Shannon

This is the plumbing every AI product needs once it bills per token instead of per seat. With reasoning spend swinging the way GLM-5.2 just showed, accurate metering is the difference between margin and a surprise.

Oday

GrapheneOS has been ported to Android 17, with official releases close. Meanwhile, Volkswagen's app now refuses to run on GrapheneOS.

Shannon

That's integrity attestation locking out hardened devices. If you lean on Play Integrity, know you're quietly excluding privacy-conscious users. The blocklist is a product decision, not a security necessity.

Oday

And a cautionary one, a service holding user images behind a surprise five-dollar paywall.

Shannon

Own your storage or assume the host monetizes your lock-in. Budget for egress and an exit before you wire anything to a free tier.

Oday

Quick break — two from the desk.

Shannon

One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.

Oday

And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.

Oday

MicroUI is a tiny immediate-mode UI library in ANSI C, zero dependencies.

Shannon

Eclipse Zenoh unifies pub-sub, geo-distributed storage, and queries in one stack.

Oday

Stop Killing Games failed to secure EU law despite one-point-three million signatures.

Shannon

A French physicist and media star lost his doctorate after a plagiarism probe.

Oday

And Bubbles.town is a Hacker News built for independent blogs.

Oday

Our call: an open-weights model clears fifty-five on the Intelligence Index, matching GPT-5.5 at high reasoning, by September thirtieth, erasing the gap at the very top.

Shannon

Proven wrong if no open model hits fifty-five on v4.1 or its successor by then. The gap from fifty-one to fifty-five is a longer reasoning trace, not a bigger model. Settles September thirtieth.

The Big Story

GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens

Z.ai's GLM-5.2 is now the top open-weights model on the Artificial Analysis Intelligence Index, scoring 51 on v4.1. That is 11 points above GLM-5.1 at the same size, 753B total parameters with 40B active, shipped under MIT. It clears MiniMax-M3 (44) and DeepSeek V4 Pro (44), and it sits second on the Code Arena WebDev leaderboard behind only Claude Fable 5. The gains are concentrated in reasoning: CritPt scientific reasoning jumped to 21%, Humanity's Last Exam to 40%, AA-LCR to 71%.

The move underneath is plain. This is a same-size MoE refresh, not a bigger model. The 11-point jump comes mostly from spending far more at inference time. GLM-5.2 burns 43k output tokens per Intelligence Index task, 37k of it reasoning, up from 26k on GLM-5.1 and well above MiniMax-M3 (24k) and Kimi K2.6 (35k). Same weights class, more thinking, higher score. The headline buys you intelligence by buying you tokens.

Run the cost. First-party pricing holds at GLM-5.1 levels: $1.4/M input, $4.4/M output, $0.26/M cache hits. At 43k output tokens a task, that is roughly $0.19 per task in output alone before you count input. A model scoring 44 at 24k output tokens answers for under half that. If you are running an agent loop at scale, the leaderboard rank and the invoice point in opposite directions. Pick GLM-5.2 when answer quality on hard reasoning is the constraint and volume is modest. Stay on a leaner open model, or cap its reasoning budget, when you are pushing millions of tasks a day and the marginal correct answer is not worth 80% more tokens.

The ranking context matters too. Fifty-one is fourth overall, behind Claude Fable 5 (60), Claude Opus 4.8 (56), and GPT-5.5 at xhigh reasoning (55). The open-weights top is now a long reasoning trace away from the closed frontier, and closing it costs compute you pay for on every call. That is the trade the whole field is making. Test-time compute is the cheap way to climb a benchmark and the expensive way to run a product.

For the next two quarters, watch token efficiency become the number that actually matters in open weights. A score of 51 that costs 43k tokens is a worse deal for most serving stacks than a 47 at 25k. Whoever ships the first open model that holds GLM-5.2's intelligence while halving its reasoning spend wins the production market, not the leaderboard screenshot.

@newsycombinator Read source 1,224 engagement

Compute & Infrastructure

DOJ calls xAI's unpermitted gas turbines an energy security matter

The Justice Department is framing xAI's unpermitted gas turbines as a question of national, economic, and energy security, which is the legal cover for keeping them running. The subtext for every operator: frontier training and inference now lean on behind-the-meter generation faster than permitting can move, and the federal government would rather fight the air-quality fight than slow the compute. Power, not silicon, is the binding constraint, and the rules are bending around it.

@newsycombinator Read source 149 engagement

US holds off blacklisting DeepSeek as 100+ firms get flagged

Washington declined to add DeepSeek to its export blacklist even as it deemed more than 100 firms security risks. For builders, the read is that DeepSeek weights stay legally usable in the US for now, but the model sits one policy memo from being off limits. Do not architect a production stack around a single Chinese open-weights provider you cannot swap out in a week.

@newsycombinator Read source 159 engagement

Swapping a homelab to a Broadcom SFP+ module for 10Gb Ethernet

A practical writeup on moving a 10Gb/s link to a Broadcom SFP+ transceiver, the kind of detail that decides whether your local model-serving box actually saturates the network. Worth a skim if you are wiring a small inference cluster and tired of mystery link drops on cheap optics.

@newsycombinator Read source 364 engagement

Developer Tools

codebase-memory-mcp turns a repo into a graph agents query instead of grep

This MCP server indexes a codebase into a persistent SQLite knowledge graph of functions, calls, routes, and cross-service links, so an agent answers structural questions by traversing the graph rather than reading files. The marketing says 99% fewer tokens, the landing page shows a 120x best case (412k tokens down to 3.4k across five questions), and the preprint reports a more honest 10x with 83% answer quality across 31 repos. It full-indexes the Linux kernel, 28M LOC, in three minutes and answers in under 1ms. Tree-sitter covers 158 languages, but type-resolving Hybrid LSP only covers about a dozen, the rest fall back to text. Single static binary, solo maintainer, 3,757 stars.

@github Read source 3,590 engagement

Stop Using JWTs makes the rounds again

A gist arguing JWTs are the wrong default for sessions hit 344 points: stateless tokens you cannot revoke, footgun algorithm fields, and storage you cannot invalidate. If you are reaching for JWTs because a tutorial said so, read this first, then default to server-side sessions unless you have a real cross-service reason. The argument is old, the mistakes are not.

@newsycombinator Read source 740 engagement

RFC 10008 standardizes the HTTP QUERY method

The HTTP QUERY method is now an RFC: a safe, idempotent way to send a request body with query semantics, fixing the long-standing mess of GET-with-body and overloaded POST for search endpoints. Expect framework and client support to trickle in over the next year. Useful the moment your search API outgrows what fits in a URL.

@newsycombinator Read source 431 engagement

Continue ships its open-source coding agent

The Continue project is back in the feed as an open-source coding agent, the self-hosted alternative for teams that will not pipe their codebase through a vendor. Pair it with a graph-index MCP like the one above and you get an agent that reasons about structure without burning a context window on file reads.

@github Read source 190 engagement

HTTP requests with no curl, using Bash /dev/TCP

A reminder that Bash can open raw TCP sockets through /dev/tcp, letting you make HTTP requests in a container that ships no curl or wget. Handy for minimal images and locked-down CI where adding a binary is a fight. Not a tool, a trick worth filing away.

@newsycombinator Read source 709 engagement

AI & Models

60% of US consumers say the word AI in messaging is a turnoff

A WordPress VIP survey of 2,000 people found 60% of US consumers are put off by brands touting AI, and 86% still want original sources. The builder twist buried in the same report: 60% of enterprises saw traffic from AI answer engines rise, and one product lifted sales of its top tiers by removing the word AI while keeping the feature. Ship the capability, drop the label.

@newsycombinator Read source 1,647 engagement

Tim Ferriss says AI is gutting how-to nonfiction sales

Self-help units fell 26.3% year over year in Q1 2026, and Ferriss reports his own catalog down about 45% in the second half of 2025, blaming LLMs that substitute for lookup-table books. It is one author plus aggregate BookScan data, and he names confounders like Amazon stocking and post-TikTok reversion. Still a real signal for anyone whose product is packaged prescriptive knowledge: the chatbot is the new substitute good.

@newsycombinator Read source 790 engagement

GPT-NL pitches a sovereign Dutch language model

TNO's GPT-NL is a state-backed model trained for the Netherlands, the latest in a run of sovereign-AI projects betting that data residency and language coverage beat raw capability for public-sector buyers. The economics are the question: a national model competing against frontier APIs needs a captive market to justify the compute. Watch whether it ships weights or stays a procurement story.

@newsycombinator Read source 528 engagement

Only 16% of Americans expect AI to help society

A new study puts public optimism about AI at 16%, the kind of number that shapes regulation and enterprise risk appetite well before it shapes model releases. If you sell to consumers, this is the trust gap you are pricing against. Pair it with the AI-as-turnoff survey and the message is one signal, not two.

@newsycombinator Read source 193 engagement

The case that AI demands more engineering discipline, not less

Charity Majors argues that agents amplify whatever rigor your team already has, so sloppy testing and weak observability get worse, not better, once code generation speeds up. The practical takeaway: invest in review, CI, and rollback before you scale agent throughput. Velocity without guardrails is just faster incidents.

@newsycombinator Read source 361 engagement

Launches & Releases

Epic open-sources Lore, a Rust VCS aimed at Perforce

Epic released Lore under MIT, a content-addressable version control system in Rust built for code plus multi-GB binary assets, with Merkle-chained revisions, content-level dedup, and chunked uploads so editing a few KB of a huge file re-uploads only that. It is the same tech that ran inside Unreal as Unreal Revision Control, now opened to remove Perforce's per-seat cost for smaller studios. Caveat: it is pre-1.0, and the desktop client most users will touch ships as a closed binary with proprietary dependencies.

@newsycombinator Read source 1,058 engagement

Wolfram Language and Mathematica hit version 15 with built-in AI

Version 15 folds an AI assistant directly into the Wolfram Language alongside symbolic music and new core functions. The interesting bit for builders is the pairing: a symbolic engine that can check the LLM's arithmetic, which is exactly the gap most agent stacks paper over. Worth a look if you do anything computational where wrong answers are expensive.

@newsycombinator Read source 274 engagement

OpenMontage turns a coding agent into a video studio

An open-source agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills, wiring your coding assistant into rendering and editing work. Early and ambitious, but it is a clean example of the MCP-tool pattern spreading well past code into media production. Useful to study even if you never cut a video.

@github Read source 355 engagement

Startups & Capital

Adam (YC W25) launches open-source AI CAD

Adam open-sourced CADAM, an AI system for parametric CAD generation, going after a domain where wrong geometry is unforgiving and the training data is scarce. The bet is that generative CAD becomes useful enough to anchor a paid layer on top of open weights. Hard problem, real moat if it works.

@newsycombinator Read source 87 engagement

Lago is the open-source metering layer under usage-based pricing

Lago handles consumption tracking, subscriptions, and revenue analytics, the plumbing every AI product needs once it bills per token or per call instead of per seat. As model costs swing with reasoning spend, accurate metering is the difference between margin and a surprise. The open-source option matters when your usage data is the business.

@github Read source 260 engagement

Security

GrapheneOS ports to Android 17 with releases coming

GrapheneOS has been ported to Android 17 and official releases are near, keeping the hardened mobile OS current with the base platform. For anyone shipping apps to privacy-strict users, this is the build to test against. The fast port also signals a healthier upstream relationship than the project's recent friction suggested.

@newsycombinator Read source 1,204 engagement

Volkswagen starts blocking GrapheneOS users from its app

VW's app now refuses to run on GrapheneOS, the latest case of integrity-attestation APIs locking out hardened or rooted devices. If you build mobile apps and lean on Play Integrity, know that you are quietly excluding a privacy-conscious slice of users. The blocklist is a product decision, not a security necessity.

@newsycombinator Read source 536 engagement

Want your images back? That will be $5

A writeup on a service holding user images behind a surprise paywall, a clean cautionary tale about depending on third-party media hosts with no export path. The lesson is old and keeps biting: own your storage or assume the host will monetize your lock-in. Budget for egress and exit before you wire anything to a free tier.

@newsycombinator Read source 918 engagement

Quick Hits

MicroUI is a tiny immediate-mode UI library in ANSI C, no dependencies

@newsycombinator

Eclipse Zenoh unifies pub/sub, geo-distributed storage, and queries in one stack

@github

Stop Killing Games fails to secure EU law despite 1.3M signatures

@newsycombinator

French physicist and media star loses doctorate after plagiarism probe

@newsycombinator

Bubbles.town is a Hacker News built for independent blogs

@newsycombinator

A tour of the PDP-11, the minicomputer that shaped modern computing

@newsycombinator

Ribbie.tv ships an 8-bit live gamecast for baseball

@newsycombinator

The Takeaway

Token economics is the real spec now. GLM-5.2 climbed 11 points by spending 43k output tokens a task, while a graph-index MCP cuts agent reads by 10x or more. If you run agents in production this week, instrument cost per correct answer, not benchmark rank, then cap reasoning budgets and move structural lookups to a code graph before you scale throughput.

The Call C-20260618

An open-weights model clears 55 on the Artificial Analysis Intelligence Index, matching GPT-5.5 at xhigh reasoning, by September 30, 2026, erasing the closed-versus-open gap at the very top.

The case

GLM-5.2 jumped 11 points at the same parameter size in one release by spending more test-time compute, and the open-weights field is shipping MoE refreshes on a fast cadence. Consensus still assumes open models trail the frontier by roughly six months, but the gap from 51 to 55 is a longer reasoning trace, not a bigger model.

What proves us wrong

No open-weights model scores 55 or higher on the Artificial Analysis Intelligence Index v4.1 (or its successor) by September 30, 2026.

Settles by September 30, 2026

GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens

DOJ calls xAI's unpermitted gas turbines an energy security matter

US holds off blacklisting DeepSeek as 100+ firms get flagged

Swapping a homelab to a Broadcom SFP+ module for 10Gb Ethernet

codebase-memory-mcp turns a repo into a graph agents query instead of grep

Stop Using JWTs makes the rounds again

RFC 10008 standardizes the HTTP QUERY method

Continue ships its open-source coding agent

HTTP requests with no curl, using Bash /dev/TCP

60% of US consumers say the word AI in messaging is a turnoff

Tim Ferriss says AI is gutting how-to nonfiction sales

GPT-NL pitches a sovereign Dutch language model

Only 16% of Americans expect AI to help society

The case that AI demands more engineering discipline, not less

Epic open-sources Lore, a Rust VCS aimed at Perforce

Wolfram Language and Mathematica hit version 15 with built-in AI

OpenMontage turns a coding agent into a video studio

Adam (YC W25) launches open-source AI CAD

Lago is the open-source metering layer under usage-based pricing

GrapheneOS ports to Android 17 with releases coming

Volkswagen starts blocking GrapheneOS users from its app

Want your images back? That will be $5

Get this briefing in your inbox