Z.ai shipped GLM-5 under MIT and is already two releases past it

The Rundown No. 119 · Audio Edition · 10 min All episodes RSS MP3

0:00 / 9:54

VTT

Oday

Z.ai shipped GLM-5 under an MIT license this week, and by the time the wire wrote it down, the lab was already two releases past it.

Shannon

It's Friday, June 19, 2026. Here's the rundown.

Shannon

One open-weight model rewriting the long-context math, a compute beat that runs from microVMs to nuclear, and a call about whose model sits inside your coding tool by September.

Oday

Start with the numbers. GLM-5 is a mixture-of-experts model, 744 billion parameters, 40 billion active, trained on 28.5 trillion tokens. It claims top spots on ArtificialAnalysis and both LMArena boards.

Oday

Then they pushed past it twice in days. The live SKU is GLM-5.2: a one million token context window, up to 131,072 tokens of output, two reasoning levels, on every Coding Plan tier right now.

Shannon

And the headline that filed plain GLM-5 is already wrong. Anyone picking a model off that string is two versions behind.

Oday

The lever underneath it is DeepSeek Sparse Attention plus an async RL stack they call slime. What does DSA actually buy you here?

Shannon

It's what turns a million-token window from a slide into something you can serve. Attention over long sequences is the line item that normally makes million-token contexts unusable in production. DSA cuts that cost.

Shannon

That's why this isn't a single-turn benchmark flex. The target is agentic coding. A model reads a whole repo, holds it, and acts over many turns.

Oday

And the 744 billion parameters sound like a hardware wall.

Shannon

Only 40 billion fire per token, so it serves far cheaper than the parameter count suggests. No marketing tax on that one. It's how MoE works.

Shannon

The catch is the obvious one. Those weights still need real GPUs. The cheap path is a managed endpoint until you can justify owning the hardware.

Oday

So if you're building agentic coding or repo-scale retrieval this week, the move is to test it.

Shannon

Test it against your current closed default, and compare cost per completed task, not benchmark scores. An open-weight one-million-window model you control changes the math on any product pushing large prompts.

Oday

The piece the wire missed is the cadence. DSA came from DeepSeek, slime is Z.ai's, both under permissive licenses, both shipping faster than anyone can write the model-selection memo.

Shannon

That's the real story. Chinese open-weight labs are setting the long-context price floor now. The squeeze lands on closed labs charging a markup for the same window.

Oday

Browser-use is running Firecracker microVMs nested inside plain EC2. A VM inside a VM, booting browsers about 30 seconds after launch, with the host reading the ready signal over vsock in under a millisecond.

Shannon

And they picked regular EC2 over the pricey metal instances on purpose, because hosts are faster to get and cheaper to keep. The honest claim is roughly three times cheaper and faster, not the sub-second figure the headline implied.

Shannon

If you run cloud browsers at scale, that's the cost pattern to copy.

Oday

Tim Cook says Apple prices go up because memory chip costs are climbing. That's the consumer echo of the DRAM and HBM crunch driving the AI buildout.

Shannon

Memory is the tax nobody escapes. If you're speccing GPU servers or edge devices in the next two quarters, budget for RAM, not just compute. Flat memory cost is no longer a safe assumption.

Oday

Switzerland's parliament lifted its ban on new nuclear plants. Won't add a megawatt this year.

Shannon

No, but it's another government treating baseload as strategic again. Power is the binding constraint on datacenter expansion. Watch whether nuclear timelines start showing up in European siting decisions.

Oday

And Ubiquiti shipped an enterprise NAS built on ZFS. Snapshots, checksums, data integrity as table stakes.

Shannon

If you keep training data or model artifacts on-prem to dodge cloud egress, price it against Synology and TrueNAS. The integration might win.

Oday

Midjourney's first hardware is a full-body ultrasound scanner. 60 seconds, built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system.

Shannon

And the founder says, quote, we're not even using any AI in this yet. It's not FDA-cleared, about a dozen people have been scanned, and they admit they haven't solved turning noisy waves into images.

Shannon

Treat it as a hardware bet, not a diffusion-model story. Midjourney's paying Butterfly 15 million up front plus 10 million a year for five years to find out.

Oday

DeepSeek Vision got resurfaced as a launch. It's a beta from April 29.

Shannon

Seven weeks old, no V4 technical report, no stable API. You can't integrate against it. The decision-relevant event was V4's price cut, not the vision toggle.

Oday

Alex Ellis makes the case that local Qwen isn't a worse Opus, it's a different tool.

Shannon

Right frame. Local wins on latency, privacy, and cost per token even when it loses on raw reasoning. Route your high-volume, well-scoped tasks to local, and save Opus for the calls that need it.

Oday

And Lightricks released LTX-2 with an official package for inference and LoRA training on an audio-video model.

Shannon

The LoRA trainer is the useful part. Adapt it to a style or subject without retraining from scratch. Self-hostable base if you're building video features.

Oday

Kilo Code passes model pricing straight through, zero markup. Open-source agent for VS Code, JetBrains, and CLI, 500-plus models at the provider's own rate, including GPT-5.5 and Claude Opus 4.7.

Shannon

The pass-through gateway is the draw. Though the traction numbers fight each other, 1.5 million users on GitHub, 3 million on the marketing site. Pick whichever the marketing team needed that day.

Oday

There's also an auto flag that runs fully autonomous in CI with every permission prompt off.

Shannon

Trusted environments only. Point that at an untrusted repo and you'll learn something expensive.

Oday

gortex builds a local code graph that claims up to 50 times fewer tokens, across 257 languages, fully local.

Shannon

It serves precise graph slices instead of dumping files into context. If your agent bills are dominated by context stuffing, a graph in front of the model is the cheaper architecture. Plane is also worth a look if Jira's per-seat price is grinding you down.

Oday

And a correction. The wire called roboflow's RF-DETR a video studio. It's a real-time detection transformer on a DINOv2 backbone, accepted to ICLR 2026, 2.3 milliseconds per frame on a T4.

Shannon

Apache 2.0, six sizes. Ignore any 500-skills claim glued to that name.

Oday

A researcher documented roughly 10,000 GitHub repositories seeded with Trojan payloads. The classic trap for a developer cloning what looks like a useful tool.

Shannon

And it's worse now because agentic tools auto-pull dependencies and run code. Pin your sources, review before you run, and don't let an auto agent execute untrusted repos. That's the whole attack in one sentence.

Oday

Emacs 31 is nearing release, and a long-time user walks through what's already worth using before the stable cut.

Shannon

Useful if you live in Emacs, skippable if you don't. And SteamOS 3.8 shipped stable. A quiet point release unless you build or test on the Steam Deck.

Oday

Quick break — two from the desk.

Shannon

One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.

Oday

And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.

Oday

Microsoft's new Outlook takes 10 seconds to do what Classic does instantly.

Shannon

For ignoring files, .gitignore isn't your only option, try .git/info/exclude and skip-worktree.

Oday

Cornell's CS6120 advanced compilers course is free and self-guided.

Shannon

Glojure runs Clojure hosted on Go, and Kong's Insomnia now covers GraphQL, REST, WebSockets, SSE, and gRPC in one open-source client.

Oday

asynq is a distributed task queue in Go for reliable background jobs, and hospitals and universities are repurposing drugs at 90 percent lower cost.

Oday

Our call: by September 19, at least one top-five AI coding tool sets or recommends a Chinese open-weight model, GLM or DeepSeek, as a default for long-context agentic work.

Shannon

We're wrong if every top-five tool by usage still keeps a US closed model as the recommended default on that date. Settles September 19.

The Big Story

Z.ai shipped GLM-5 under MIT and is already two releases past it

Z.ai launched GLM-5, an MIT-licensed mixture-of-experts model aimed at long-horizon agentic coding, then pushed past it twice in days. GLM-5 scales to 744B parameters with 40B active, up from GLM-4.5's 355B/32B, and trains on 28.5T tokens versus 23T. It claims top spots on ArtificialAnalysis, LMArena Text, and LMArena Code. The flagship for engineering work, GLM-5.1, is a 754B MoE with a 200K window and SWE-Bench Pro leadership. The live SKU is GLM-5.2: a 1 million token context window, up to 131,072 output tokens, and two reasoning levels, available now to every GLM Coding Plan tier.

The lever underneath all of this is DeepSeek Sparse Attention plus an asynchronous RL stack called slime. DSA is what makes a 1M context window deployable instead of theoretical. It cuts the cost of attention over long sequences, which is the line item that normally makes million-token contexts unusable in production. Z.ai is not chasing single-turn benchmark wins. The target is multi-step agentic coding where a model reads a whole repo, holds it, and acts over many turns. That is exactly the workload where context length and serving cost decide whether a feature ships.

If you are building agentic coding or repo-scale RAG, this is worth testing this week. Weights are MIT on Hugging Face and ModelScope, so you can self-host or rent. A 744B/40B-active MoE serves far cheaper than its parameter count suggests because only 40B fire per token. Against closed frontier models charging premium rates for long context, an open-weight 1M-window model you control changes the math on any product pushing large prompts. The catch is the obvious one: 744B of weights needs real hardware, and the cheap path is a managed endpoint until you can justify the GPUs.

The signal for the next six months is the part the wire missed. The headline filed plain GLM-5 while Z.ai had already iterated to 5.1 and 5.2. Chinese open-weight labs are now setting the long-context price floor, and they are shipping faster than anyone can write the model-selection memo. DSA came from DeepSeek, slime is Z.ai's, and both land under permissive licenses. The squeeze falls on closed labs selling long context at a markup. When a 1M-window MIT model serves agentic coding at a fraction of the cost, the default-model decision inside every coding tool gets reopened.

Anyone choosing a model off the string GLM-5 is two releases behind. The number that matters is 1M, and it is already in production.

@github Read source 1,430 engagement

Compute & Infrastructure

Browser-use runs Firecracker microVMs nested inside plain EC2

The trick is a VM inside a VM: AWS already runs your EC2 instance as a VM, and browser-use runs Firecracker browser VMs inside that. They picked regular EC2 over pricey .metal instances because hosts are faster to get and cheaper to keep, booting from a pre-built image and serving browsers about 30 seconds after launch, with the host reading the ready message over vsock in under a millisecond. The blog's real claim is roughly 3x cheaper and faster, not the sub-1s figure the headline implied. If you run cloud browsers at scale, this is the cost pattern to copy.

@newsycombinator Read source 596 engagement

Tim Cook says Apple prices rise as memory chip costs climb

Memory pricing is the tax nobody escapes. Cook flagged rising memory chip costs as a reason Apple prices go up, which is the consumer-facing echo of the DRAM and HBM crunch driving AI buildouts. If you are speccing GPU servers or edge devices in the next two quarters, budget for memory, not just compute. The squeeze hits anyone whose unit economics assume flat RAM costs.

@newsycombinator Read source 132 engagement

Swiss parliament lifts its ban on new nuclear plants

Power is the binding constraint on datacenter expansion, and Switzerland just reopened a supply lever it had shut. Lifting the new-build ban won't add a megawatt this year, but it is another data point in the pattern of governments treating baseload as strategic again. Watch whether nuclear timelines start showing up in European datacenter siting decisions.

@newsycombinator Read source 726 engagement

Ubiquiti ships an enterprise NAS built on ZFS

Ubiquiti is moving up the stack from networking into storage with a ZFS-based enterprise NAS. ZFS means snapshots, checksums, and data integrity as table stakes rather than add-ons. If you run on-prem storage for training data or model artifacts and want to stay off cloud egress bills, this is a cheaper integrated option worth pricing against Synology and TrueNAS.

@newsycombinator Read source 357 engagement

AI & Models

Midjourney's first hardware is a full-body ultrasound scanner, with no AI in it yet

The wire called this an AI imaging model. It is a 60-second full-body ultrasound device built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system, and Holz says "We're not even using any AI in this yet." Midjourney pays Butterfly a $15M one-time fee plus $10M annually over five years. It is not FDA-cleared, about a dozen people have been scanned, and the company admits it still hasn't solved turning noisy waves into static images. Treat it as a hardware bet, not a diffusion-model story.

@newsycombinator Read source 1,250 engagement

DeepSeek Vision is a seven-week-old beta, not a fresh launch

The HN post resurfaced an event from April 29: a limited vision-mode beta that appears alongside Fast and Expert modes. It is DeepSeek's first multimodal move, built on V4's native multimodal architecture where image and video understanding were baked in during pre-training. There is still no V4 or Vision technical report and no stable API, so you can't integrate against it yet. The decision-relevant event was V4's price cut, not the vision toggle.

@newsycombinator Read source 704 engagement

Local Qwen is a different tool, not a worse Opus

Alex Ellis argues the right frame for local models is task fit, not a benchmark ranking against frontier models. A local Qwen that runs offline on your own hardware wins on latency, privacy, and cost per token even when it loses on raw reasoning. If you are pushing high-volume, well-scoped tasks, route them to local before paying frontier rates. Save Opus for the calls that actually need it.

@newsycombinator Read source 216 engagement

Lightricks releases LTX-2 with inference and LoRA training

LTX-2 ships an official Python package for inference and LoRA fine-tuning of an audio-video generative model. The LoRA trainer is the practical part: it lets you adapt the model to a specific style or subject without retraining from scratch. If you are building video generation features, this is a self-hostable base you can specialize cheaply.

@github Read source 235 engagement

Developer Tools

Kilo Code passes model pricing straight through with zero markup

Kilo is an open-source coding agent for VS Code, JetBrains, and CLI that exposes 500+ models, including GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview, at the provider's own rate. The CLI is a fork of OpenCode, and a --auto flag runs fully autonomous in CI with all permission prompts off, intended only for trusted environments. Traction is self-reported and inconsistent across its own pages: 1.5M users and 25T tokens on GitHub versus 3M users and 40T tokens on the marketing site. The pass-through gateway is the real draw if you want to dodge agent-tool markups.

@github Read source 6,695 engagement

Plane offers an open-source alternative to Jira, Linear, and ClickUp

Plane is a self-hostable project management platform covering tasks, sprints, docs, and triage. The pitch is owning your roadmap data instead of renting it per seat. If your team is feeling Jira's price or wants project tracking inside your own infra, it is worth a trial.

@github Read source 3,050 engagement

gortex builds a local code graph that cuts agent token usage up to 50x

gortex is a code intelligence engine across 257 languages, exposed via CLI, MCP server, and API, built so AI coding agents pull only the code they need. The claim is up to 50x fewer tokens by serving precise graph slices instead of dumping files into context, and it runs 100% local. If your agent bills are dominated by context stuffing, a code graph in front of the model is the cheaper architecture.

@github Read source 520 engagement

Correction: roboflow/rf-detr is an object detector, not a video studio

The wire grafted an unrelated description onto this repo. RF-DETR is a real-time transformer for detection, segmentation, and keypoints on a DINOv2 backbone, accepted to ICLR 2026, spanning six sizes from 30.5M to 126.9M parameters. RF-DETR-N runs at 2.3ms per frame on a T4 and hits 60.1 AP on COCO at 2XL. Core models are Apache 2.0; ignore any "500+ agent skills" claim attached to this name.

@github Read source 545 engagement

Security

10,000 GitHub repositories found distributing Trojan malware

A researcher documented roughly 10k repos seeded with Trojan payloads, the kind of supply-chain trap that snares developers cloning what looks like a useful tool. With agentic coding tools auto-pulling dependencies and running code, the attack surface is wider than ever. Pin sources, review before you run, and don't let an --auto agent execute untrusted repos.

@newsycombinator Read source 520 engagement

Launches & Releases

Emacs 31 nears release with daily-driver-ready changes

A long-time user walks through the Emacs 31 features already worth using before the stable cut. Practical if you live in Emacs and want to know what changes your workflow versus what is cosmetic. Skip if you don't.

@newsycombinator Read source 622 engagement

SteamOS Linux 3.8 ships as stable

Valve pushed SteamOS 3.8 to stable. Relevant if you build or test on the Steam Deck or care about the Linux gaming stack maturing. Otherwise a quiet point release.

@newsycombinator Read source 99 engagement

Quick Hits

Microsoft's new Outlook takes 10 seconds to do what Classic does instantly

@newsycombinator

.gitignore isn't the only way to ignore files: try .git/info/exclude and skip-worktree

@newsycombinator

Cornell's CS6120 advanced compilers course is free and self-guided

@newsycombinator

Glojure runs Clojure hosted on Go

@newsycombinator

Kong's Insomnia covers GraphQL, REST, WebSockets, SSE, and gRPC in one open-source client

@github

asynq is a distributed task queue in Go for reliable background jobs

@github

Hospitals and universities are repurposing drugs at 90% lower cost

@newsycombinator

Modos color monitor pushes e-paper displays further

@newsycombinator

The Takeaway

Two stories point the same way: GLM-5.2 ships a 1M-context MIT model, and Kilo Code routes 500+ models at zero markup. The cost advantage in agentic coding is moving to open weights plus pass-through pricing. This week, run your repo-scale coding workload against a hosted GLM-5.2 endpoint and against your current closed default, then compare cost per completed task, not benchmark scores. If gortex-style code graphs cut your context tokens 10x, the open-weight path gets cheaper still.

The Call C-20260619

By September 19, 2026, at least one top-five AI coding tool will set a Chinese open-weight model (GLM or DeepSeek) as a default or explicitly recommended option for long-context agentic tasks.

The case

GLM-5.2 ships a 1M-context MIT model with DSA driving long-context serving cost down, DeepSeek's V4 cut prices hard, and tools like Kilo already expose these models at provider pricing with zero markup. Consensus still assumes US closed frontier models stay the default inside coding tools. The cost and license gap on long-context agentic work is now wide enough to flip that default.

What proves us wrong

By September 19, 2026, no top-five AI coding tool (by usage) sets or recommends a Chinese open-weight model as a default for long-context or agentic coding; all keep a US closed model as the recommended default.

Settles by September 19, 2026

The Tape T-20260619

▼ Short ADBE Adobe medium conviction

Each new free or open visual-generation model drains another dollar of Firefly's pricing power, and Adobe's subscription COGS already grows faster than revenue as inference eats the margin. The multiple keeps compressing toward the only moat left, which is indemnity, not image quality.

Today stacks DeepSeek Vision inside a mass free chat app, LTX-2 open audio-video, roboflow's agentic video system, and Midjourney's medical push, all pressing the same commoditization the desk flagged on June 12. Adobe trades roughly 43% off its 2025 peak with a rare Goldman Sell, and subscription cost grew about 13% against 11% revenue as AI inference compresses gross margin. When generation goes free, commercially-safe training is a feature, not a premium moat.

Wrong if Adobe's next print, fiscal Q3 FY26 in September, shows Digital Media net-new ARR reaccelerating and subscription gross margin stabilizing year over year. Settles September 2026

◆ Watch MU Micron Technology medium conviction

The memory squeeze is now confirmed by the largest device buyer on earth, and Micron has already taken the move. The early phase of the memory long is over heading into the June 24 print, which is what changed since the desk's June 16 WATCH.

Cook calling Apple price hikes unavoidable is the real-economy confirmation of the DRAM and NAND squeeze the desk flagged, but MU is up roughly 820% in a year, sits near its $1,110 high, and roared up on the Apple headline itself. Options price about a 17.6% swing around June 24 earnings where consensus EPS sits near $20. Confirmation this loud and this late marks a crowded trade, not an early one.

Wrong if MU prints a beat-and-raise on June 24, 2026 and closes at a fresh all-time high above $1,110 in the days after, showing the memory long still had room. Settles July 2026

◆ Watch AAPL Apple low conviction

Memory inflation is now a confirmed gross-margin headwind Apple will answer by raising prices into a soft consumer cycle, putting unit growth at risk even as the tape cheers the hikes as proof that hardware is underpriced. Timing on the demand hit is unclear, so this reads as a watch, not a short.

Apple guided fiscal Q2 gross margin to 48-49% with memory weighing more from here, and Bernstein pegs the hit near 1.5 points by year-end, while TechInsights estimates about $270 added to the next iPhone Pro. Cook's unavoidable language is an escalation from the January we'll-look-at-options framing. The bullish read treats pass-through as free and ignores elasticity if a roughly $1,299 iPhone Pro meets a stretched consumer.

Wrong if Apple's fiscal Q3 2026 report in late July guides gross margin at or above 48% with no unit-demand warning attached to memory pricing. Settles August 2026

◆ Watch BABA Alibaba low conviction

The open-weight fallback the crowd prices into Alibaba is no longer obviously Qwen. GLM-5 shipping MIT and topping the agentic-coding boards moves the default open option toward Z.ai, which erodes the download-share narrative underpinning the Qwen-fallback read the desk has carried.

GLM-5 launched MIT-licensed and claims top spots on ArtificialAnalysis and LMArena Code, the exact long-horizon agentic workload where Qwen's download lead was the moat. DeepSeek adding vision the same day widens the Chinese open-weight field further. Alibaba's equity never monetized Qwen directly, so the self-host-Qwen thesis was always thin, and a better free competitor makes it thinner.

Wrong if Qwen holds above 50% of open-weight model downloads on Hugging Face through Q3 2026, or a top-five coding tool sets Qwen rather than GLM or DeepSeek as its recommended open default. Settles September 2026

Desk signals from the day's verified wire — falsifiable, dated, settled in public. Analysis, not individualized investment advice.

Z.ai shipped GLM-5 under MIT and is already two releases past it

Browser-use runs Firecracker microVMs nested inside plain EC2

Tim Cook says Apple prices rise as memory chip costs climb

Swiss parliament lifts its ban on new nuclear plants

Ubiquiti ships an enterprise NAS built on ZFS

Midjourney's first hardware is a full-body ultrasound scanner, with no AI in it yet

DeepSeek Vision is a seven-week-old beta, not a fresh launch

Local Qwen is a different tool, not a worse Opus

Lightricks releases LTX-2 with inference and LoRA training

Kilo Code passes model pricing straight through with zero markup

Plane offers an open-source alternative to Jira, Linear, and ClickUp

gortex builds a local code graph that cuts agent token usage up to 50x

Correction: roboflow/rf-detr is an object detector, not a video studio

10,000 GitHub repositories found distributing Trojan malware

Emacs 31 nears release with daily-driver-ready changes

SteamOS Linux 3.8 ships as stable

Get this briefing in your inbox