nextbig.dev
Vancouver, B.C. · Intelligence on AI and the machines that run it
nextbig.dev
The Briefing · Thursday, June 18, 2026

GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens

GLM-5.2 leads open weights at 51, but burns 43k tokens a task. Plus Epic open-sources Lore VCS and a code-graph MCP cuts agent reads 10x.

10 min read
The Rundown No. 118 · Audio Edition · 10 min All episodesRSSMP3
0:00 / 10:19
VTT
The Big Story
GLM-5.2 takes the open-weights crown at 51, and pays for it in tokens

Z.ai's GLM-5.2 is now the top open-weights model on the Artificial Analysis Intelligence Index, scoring 51 on v4.1. That is 11 points above GLM-5.1 at the same size, 753B total parameters with 40B active, shipped under MIT. It clears MiniMax-M3 (44) and DeepSeek V4 Pro (44), and it sits second on the Code Arena WebDev leaderboard behind only Claude Fable 5. The gains are concentrated in reasoning: CritPt scientific reasoning jumped to 21%, Humanity's Last Exam to 40%, AA-LCR to 71%.

The move underneath is plain. This is a same-size MoE refresh, not a bigger model. The 11-point jump comes mostly from spending far more at inference time. GLM-5.2 burns 43k output tokens per Intelligence Index task, 37k of it reasoning, up from 26k on GLM-5.1 and well above MiniMax-M3 (24k) and Kimi K2.6 (35k). Same weights class, more thinking, higher score. The headline buys you intelligence by buying you tokens.

Run the cost. First-party pricing holds at GLM-5.1 levels: $1.4/M input, $4.4/M output, $0.26/M cache hits. At 43k output tokens a task, that is roughly $0.19 per task in output alone before you count input. A model scoring 44 at 24k output tokens answers for under half that. If you are running an agent loop at scale, the leaderboard rank and the invoice point in opposite directions. Pick GLM-5.2 when answer quality on hard reasoning is the constraint and volume is modest. Stay on a leaner open model, or cap its reasoning budget, when you are pushing millions of tasks a day and the marginal correct answer is not worth 80% more tokens.

The ranking context matters too. Fifty-one is fourth overall, behind Claude Fable 5 (60), Claude Opus 4.8 (56), and GPT-5.5 at xhigh reasoning (55). The open-weights top is now a long reasoning trace away from the closed frontier, and closing it costs compute you pay for on every call. That is the trade the whole field is making. Test-time compute is the cheap way to climb a benchmark and the expensive way to run a product.

For the next two quarters, watch token efficiency become the number that actually matters in open weights. A score of 51 that costs 43k tokens is a worse deal for most serving stacks than a 47 at 25k. Whoever ships the first open model that holds GLM-5.2's intelligence while halving its reasoning spend wins the production market, not the leaderboard screenshot.

@newsycombinator Read source 1,224 engagement
Compute & Infrastructure

DOJ calls xAI's unpermitted gas turbines an energy security matter

The Justice Department is framing xAI's unpermitted gas turbines as a question of national, economic, and energy security, which is the legal cover for keeping them running. The subtext for every operator: frontier training and inference now lean on behind-the-meter generation faster than permitting can move, and the federal government would rather fight the air-quality fight than slow the compute. Power, not silicon, is the binding constraint, and the rules are bending around it.

US holds off blacklisting DeepSeek as 100+ firms get flagged

Washington declined to add DeepSeek to its export blacklist even as it deemed more than 100 firms security risks. For builders, the read is that DeepSeek weights stay legally usable in the US for now, but the model sits one policy memo from being off limits. Do not architect a production stack around a single Chinese open-weights provider you cannot swap out in a week.

Swapping a homelab to a Broadcom SFP+ module for 10Gb Ethernet

A practical writeup on moving a 10Gb/s link to a Broadcom SFP+ transceiver, the kind of detail that decides whether your local model-serving box actually saturates the network. Worth a skim if you are wiring a small inference cluster and tired of mystery link drops on cheap optics.

Developer Tools

codebase-memory-mcp turns a repo into a graph agents query instead of grep

This MCP server indexes a codebase into a persistent SQLite knowledge graph of functions, calls, routes, and cross-service links, so an agent answers structural questions by traversing the graph rather than reading files. The marketing says 99% fewer tokens, the landing page shows a 120x best case (412k tokens down to 3.4k across five questions), and the preprint reports a more honest 10x with 83% answer quality across 31 repos. It full-indexes the Linux kernel, 28M LOC, in three minutes and answers in under 1ms. Tree-sitter covers 158 languages, but type-resolving Hybrid LSP only covers about a dozen, the rest fall back to text. Single static binary, solo maintainer, 3,757 stars.

Stop Using JWTs makes the rounds again

A gist arguing JWTs are the wrong default for sessions hit 344 points: stateless tokens you cannot revoke, footgun algorithm fields, and storage you cannot invalidate. If you are reaching for JWTs because a tutorial said so, read this first, then default to server-side sessions unless you have a real cross-service reason. The argument is old, the mistakes are not.

RFC 10008 standardizes the HTTP QUERY method

The HTTP QUERY method is now an RFC: a safe, idempotent way to send a request body with query semantics, fixing the long-standing mess of GET-with-body and overloaded POST for search endpoints. Expect framework and client support to trickle in over the next year. Useful the moment your search API outgrows what fits in a URL.

Continue ships its open-source coding agent

The Continue project is back in the feed as an open-source coding agent, the self-hosted alternative for teams that will not pipe their codebase through a vendor. Pair it with a graph-index MCP like the one above and you get an agent that reasons about structure without burning a context window on file reads.

HTTP requests with no curl, using Bash /dev/TCP

A reminder that Bash can open raw TCP sockets through /dev/tcp, letting you make HTTP requests in a container that ships no curl or wget. Handy for minimal images and locked-down CI where adding a binary is a fight. Not a tool, a trick worth filing away.

AI & Models

60% of US consumers say the word AI in messaging is a turnoff

A WordPress VIP survey of 2,000 people found 60% of US consumers are put off by brands touting AI, and 86% still want original sources. The builder twist buried in the same report: 60% of enterprises saw traffic from AI answer engines rise, and one product lifted sales of its top tiers by removing the word AI while keeping the feature. Ship the capability, drop the label.

Tim Ferriss says AI is gutting how-to nonfiction sales

Self-help units fell 26.3% year over year in Q1 2026, and Ferriss reports his own catalog down about 45% in the second half of 2025, blaming LLMs that substitute for lookup-table books. It is one author plus aggregate BookScan data, and he names confounders like Amazon stocking and post-TikTok reversion. Still a real signal for anyone whose product is packaged prescriptive knowledge: the chatbot is the new substitute good.

GPT-NL pitches a sovereign Dutch language model

TNO's GPT-NL is a state-backed model trained for the Netherlands, the latest in a run of sovereign-AI projects betting that data residency and language coverage beat raw capability for public-sector buyers. The economics are the question: a national model competing against frontier APIs needs a captive market to justify the compute. Watch whether it ships weights or stays a procurement story.

Only 16% of Americans expect AI to help society

A new study puts public optimism about AI at 16%, the kind of number that shapes regulation and enterprise risk appetite well before it shapes model releases. If you sell to consumers, this is the trust gap you are pricing against. Pair it with the AI-as-turnoff survey and the message is one signal, not two.

The case that AI demands more engineering discipline, not less

Charity Majors argues that agents amplify whatever rigor your team already has, so sloppy testing and weak observability get worse, not better, once code generation speeds up. The practical takeaway: invest in review, CI, and rollback before you scale agent throughput. Velocity without guardrails is just faster incidents.

Launches & Releases

Epic open-sources Lore, a Rust VCS aimed at Perforce

Epic released Lore under MIT, a content-addressable version control system in Rust built for code plus multi-GB binary assets, with Merkle-chained revisions, content-level dedup, and chunked uploads so editing a few KB of a huge file re-uploads only that. It is the same tech that ran inside Unreal as Unreal Revision Control, now opened to remove Perforce's per-seat cost for smaller studios. Caveat: it is pre-1.0, and the desktop client most users will touch ships as a closed binary with proprietary dependencies.

Wolfram Language and Mathematica hit version 15 with built-in AI

Version 15 folds an AI assistant directly into the Wolfram Language alongside symbolic music and new core functions. The interesting bit for builders is the pairing: a symbolic engine that can check the LLM's arithmetic, which is exactly the gap most agent stacks paper over. Worth a look if you do anything computational where wrong answers are expensive.

OpenMontage turns a coding agent into a video studio

An open-source agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills, wiring your coding assistant into rendering and editing work. Early and ambitious, but it is a clean example of the MCP-tool pattern spreading well past code into media production. Useful to study even if you never cut a video.

Startups & Capital

Adam (YC W25) launches open-source AI CAD

Adam open-sourced CADAM, an AI system for parametric CAD generation, going after a domain where wrong geometry is unforgiving and the training data is scarce. The bet is that generative CAD becomes useful enough to anchor a paid layer on top of open weights. Hard problem, real moat if it works.

Lago is the open-source metering layer under usage-based pricing

Lago handles consumption tracking, subscriptions, and revenue analytics, the plumbing every AI product needs once it bills per token or per call instead of per seat. As model costs swing with reasoning spend, accurate metering is the difference between margin and a surprise. The open-source option matters when your usage data is the business.

Security

GrapheneOS ports to Android 17 with releases coming

GrapheneOS has been ported to Android 17 and official releases are near, keeping the hardened mobile OS current with the base platform. For anyone shipping apps to privacy-strict users, this is the build to test against. The fast port also signals a healthier upstream relationship than the project's recent friction suggested.

Volkswagen starts blocking GrapheneOS users from its app

VW's app now refuses to run on GrapheneOS, the latest case of integrity-attestation APIs locking out hardened or rooted devices. If you build mobile apps and lean on Play Integrity, know that you are quietly excluding a privacy-conscious slice of users. The blocklist is a product decision, not a security necessity.

Want your images back? That will be $5

A writeup on a service holding user images behind a surprise paywall, a clean cautionary tale about depending on third-party media hosts with no export path. The lesson is old and keeps biting: own your storage or assume the host will monetize your lock-in. Budget for egress and exit before you wire anything to a free tier.

Quick Hits
The Takeaway

Token economics is the real spec now. GLM-5.2 climbed 11 points by spending 43k output tokens a task, while a graph-index MCP cuts agent reads by 10x or more. If you run agents in production this week, instrument cost per correct answer, not benchmark rank, then cap reasoning budgets and move structural lookups to a code graph before you scale throughput.

The Call C-20260618

An open-weights model clears 55 on the Artificial Analysis Intelligence Index, matching GPT-5.5 at xhigh reasoning, by September 30, 2026, erasing the closed-versus-open gap at the very top.

The case

GLM-5.2 jumped 11 points at the same parameter size in one release by spending more test-time compute, and the open-weights field is shipping MoE refreshes on a fast cadence. Consensus still assumes open models trail the frontier by roughly six months, but the gap from 51 to 55 is a longer reasoning trace, not a bigger model.

What proves us wrong

No open-weights model scores 55 or higher on the Artificial Analysis Intelligence Index v4.1 (or its successor) by September 30, 2026.

Settles by September 30, 2026

Get this briefing in your inbox

What changed in AI and compute, what it costs, and what to build. One email per week. No spam, unsubscribe anytime.