Z.ai shipped GLM-5 under MIT and is already two releases past it
Z.ai shipped GLM-5 under MIT and already pushed to a 1M-context GLM-5.2, reopening the default-model decision in every coding tool.
Z.ai shipped GLM-5 under an MIT license this week, and by the time the wire wrote it down, the lab was already two releases past it.
It's Friday, June 19, 2026. Here's the rundown.
One open-weight model rewriting the long-context math, a compute beat that runs from microVMs to nuclear, and a call about whose model sits inside your coding tool by September.
Start with the numbers. GLM-5 is a mixture-of-experts model, 744 billion parameters, 40 billion active, trained on 28.5 trillion tokens. It claims top spots on ArtificialAnalysis and both LMArena boards.
Then they pushed past it twice in days. The live SKU is GLM-5.2: a one million token context window, up to 131,072 tokens of output, two reasoning levels, on every Coding Plan tier right now.
And the headline that filed plain GLM-5 is already wrong. Anyone picking a model off that string is two versions behind.
The lever underneath it is DeepSeek Sparse Attention plus an async RL stack they call slime. What does DSA actually buy you here?
It's what turns a million-token window from a slide into something you can serve. Attention over long sequences is the line item that normally makes million-token contexts unusable in production. DSA cuts that cost.
That's why this isn't a single-turn benchmark flex. The target is agentic coding. A model reads a whole repo, holds it, and acts over many turns.
And the 744 billion parameters sound like a hardware wall.
Only 40 billion fire per token, so it serves far cheaper than the parameter count suggests. No marketing tax on that one. It's how MoE works.
The catch is the obvious one. Those weights still need real GPUs. The cheap path is a managed endpoint until you can justify owning the hardware.
So if you're building agentic coding or repo-scale retrieval this week, the move is to test it.
Test it against your current closed default, and compare cost per completed task, not benchmark scores. An open-weight one-million-window model you control changes the math on any product pushing large prompts.
The piece the wire missed is the cadence. DSA came from DeepSeek, slime is Z.ai's, both under permissive licenses, both shipping faster than anyone can write the model-selection memo.
That's the real story. Chinese open-weight labs are setting the long-context price floor now. The squeeze lands on closed labs charging a markup for the same window.
Browser-use is running Firecracker microVMs nested inside plain EC2. A VM inside a VM, booting browsers about 30 seconds after launch, with the host reading the ready signal over vsock in under a millisecond.
And they picked regular EC2 over the pricey metal instances on purpose, because hosts are faster to get and cheaper to keep. The honest claim is roughly three times cheaper and faster, not the sub-second figure the headline implied.
If you run cloud browsers at scale, that's the cost pattern to copy.
Tim Cook says Apple prices go up because memory chip costs are climbing. That's the consumer echo of the DRAM and HBM crunch driving the AI buildout.
Memory is the tax nobody escapes. If you're speccing GPU servers or edge devices in the next two quarters, budget for RAM, not just compute. Flat memory cost is no longer a safe assumption.
Switzerland's parliament lifted its ban on new nuclear plants. Won't add a megawatt this year.
No, but it's another government treating baseload as strategic again. Power is the binding constraint on datacenter expansion. Watch whether nuclear timelines start showing up in European siting decisions.
And Ubiquiti shipped an enterprise NAS built on ZFS. Snapshots, checksums, data integrity as table stakes.
If you keep training data or model artifacts on-prem to dodge cloud egress, price it against Synology and TrueNAS. The integration might win.
Midjourney's first hardware is a full-body ultrasound scanner. 60 seconds, built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system.
And the founder says, quote, we're not even using any AI in this yet. It's not FDA-cleared, about a dozen people have been scanned, and they admit they haven't solved turning noisy waves into images.
Treat it as a hardware bet, not a diffusion-model story. Midjourney's paying Butterfly 15 million up front plus 10 million a year for five years to find out.
DeepSeek Vision got resurfaced as a launch. It's a beta from April 29.
Seven weeks old, no V4 technical report, no stable API. You can't integrate against it. The decision-relevant event was V4's price cut, not the vision toggle.
Alex Ellis makes the case that local Qwen isn't a worse Opus, it's a different tool.
Right frame. Local wins on latency, privacy, and cost per token even when it loses on raw reasoning. Route your high-volume, well-scoped tasks to local, and save Opus for the calls that need it.
And Lightricks released LTX-2 with an official package for inference and LoRA training on an audio-video model.
The LoRA trainer is the useful part. Adapt it to a style or subject without retraining from scratch. Self-hostable base if you're building video features.
Kilo Code passes model pricing straight through, zero markup. Open-source agent for VS Code, JetBrains, and CLI, 500-plus models at the provider's own rate, including GPT-5.5 and Claude Opus 4.7.
The pass-through gateway is the draw. Though the traction numbers fight each other, 1.5 million users on GitHub, 3 million on the marketing site. Pick whichever the marketing team needed that day.
There's also an auto flag that runs fully autonomous in CI with every permission prompt off.
Trusted environments only. Point that at an untrusted repo and you'll learn something expensive.
gortex builds a local code graph that claims up to 50 times fewer tokens, across 257 languages, fully local.
It serves precise graph slices instead of dumping files into context. If your agent bills are dominated by context stuffing, a graph in front of the model is the cheaper architecture. Plane is also worth a look if Jira's per-seat price is grinding you down.
And a correction. The wire called roboflow's RF-DETR a video studio. It's a real-time detection transformer on a DINOv2 backbone, accepted to ICLR 2026, 2.3 milliseconds per frame on a T4.
Apache 2.0, six sizes. Ignore any 500-skills claim glued to that name.
A researcher documented roughly 10,000 GitHub repositories seeded with Trojan payloads. The classic trap for a developer cloning what looks like a useful tool.
And it's worse now because agentic tools auto-pull dependencies and run code. Pin your sources, review before you run, and don't let an auto agent execute untrusted repos. That's the whole attack in one sentence.
Emacs 31 is nearing release, and a long-time user walks through what's already worth using before the stable cut.
Useful if you live in Emacs, skippable if you don't. And SteamOS 3.8 shipped stable. A quiet point release unless you build or test on the Steam Deck.
Quick break — two from the desk.
One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.
And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.
Microsoft's new Outlook takes 10 seconds to do what Classic does instantly.
For ignoring files, .gitignore isn't your only option, try .git/info/exclude and skip-worktree.
Cornell's CS6120 advanced compilers course is free and self-guided.
Glojure runs Clojure hosted on Go, and Kong's Insomnia now covers GraphQL, REST, WebSockets, SSE, and gRPC in one open-source client.
asynq is a distributed task queue in Go for reliable background jobs, and hospitals and universities are repurposing drugs at 90 percent lower cost.
Our call: by September 19, at least one top-five AI coding tool sets or recommends a Chinese open-weight model, GLM or DeepSeek, as a default for long-context agentic work.
We're wrong if every top-five tool by usage still keeps a US closed model as the recommended default on that date. Settles September 19.
Z.ai launched GLM-5, an MIT-licensed mixture-of-experts model aimed at long-horizon agentic coding, then pushed past it twice in days. GLM-5 scales to 744B parameters with 40B active, up from GLM-4.5's 355B/32B, and trains on 28.5T tokens versus 23T. It claims top spots on ArtificialAnalysis, LMArena Text, and LMArena Code. The flagship for engineering work, GLM-5.1, is a 754B MoE with a 200K window and SWE-Bench Pro leadership. The live SKU is GLM-5.2: a 1 million token context window, up to 131,072 output tokens, and two reasoning levels, available now to every GLM Coding Plan tier.
The lever underneath all of this is DeepSeek Sparse Attention plus an asynchronous RL stack called slime. DSA is what makes a 1M context window deployable instead of theoretical. It cuts the cost of attention over long sequences, which is the line item that normally makes million-token contexts unusable in production. Z.ai is not chasing single-turn benchmark wins. The target is multi-step agentic coding where a model reads a whole repo, holds it, and acts over many turns. That is exactly the workload where context length and serving cost decide whether a feature ships.
If you are building agentic coding or repo-scale RAG, this is worth testing this week. Weights are MIT on Hugging Face and ModelScope, so you can self-host or rent. A 744B/40B-active MoE serves far cheaper than its parameter count suggests because only 40B fire per token. Against closed frontier models charging premium rates for long context, an open-weight 1M-window model you control changes the math on any product pushing large prompts. The catch is the obvious one: 744B of weights needs real hardware, and the cheap path is a managed endpoint until you can justify the GPUs.
The signal for the next six months is the part the wire missed. The headline filed plain GLM-5 while Z.ai had already iterated to 5.1 and 5.2. Chinese open-weight labs are now setting the long-context price floor, and they are shipping faster than anyone can write the model-selection memo. DSA came from DeepSeek, slime is Z.ai's, and both land under permissive licenses. The squeeze falls on closed labs selling long context at a markup. When a 1M-window MIT model serves agentic coding at a fraction of the cost, the default-model decision inside every coding tool gets reopened.
Anyone choosing a model off the string GLM-5 is two releases behind. The number that matters is 1M, and it is already in production.
Browser-use runs Firecracker microVMs nested inside plain EC2
The trick is a VM inside a VM: AWS already runs your EC2 instance as a VM, and browser-use runs Firecracker browser VMs inside that. They picked regular EC2 over pricey .metal instances because hosts are faster to get and cheaper to keep, booting from a pre-built image and serving browsers about 30 seconds after launch, with the host reading the ready message over vsock in under a millisecond. The blog's real claim is roughly 3x cheaper and faster, not the sub-1s figure the headline implied. If you run cloud browsers at scale, this is the cost pattern to copy.
Tim Cook says Apple prices rise as memory chip costs climb
Memory pricing is the tax nobody escapes. Cook flagged rising memory chip costs as a reason Apple prices go up, which is the consumer-facing echo of the DRAM and HBM crunch driving AI buildouts. If you are speccing GPU servers or edge devices in the next two quarters, budget for memory, not just compute. The squeeze hits anyone whose unit economics assume flat RAM costs.
Swiss parliament lifts its ban on new nuclear plants
Power is the binding constraint on datacenter expansion, and Switzerland just reopened a supply lever it had shut. Lifting the new-build ban won't add a megawatt this year, but it is another data point in the pattern of governments treating baseload as strategic again. Watch whether nuclear timelines start showing up in European datacenter siting decisions.
Ubiquiti ships an enterprise NAS built on ZFS
Ubiquiti is moving up the stack from networking into storage with a ZFS-based enterprise NAS. ZFS means snapshots, checksums, and data integrity as table stakes rather than add-ons. If you run on-prem storage for training data or model artifacts and want to stay off cloud egress bills, this is a cheaper integrated option worth pricing against Synology and TrueNAS.
Midjourney's first hardware is a full-body ultrasound scanner, with no AI in it yet
The wire called this an AI imaging model. It is a 60-second full-body ultrasound device built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system, and Holz says "We're not even using any AI in this yet." Midjourney pays Butterfly a $15M one-time fee plus $10M annually over five years. It is not FDA-cleared, about a dozen people have been scanned, and the company admits it still hasn't solved turning noisy waves into static images. Treat it as a hardware bet, not a diffusion-model story.
DeepSeek Vision is a seven-week-old beta, not a fresh launch
The HN post resurfaced an event from April 29: a limited vision-mode beta that appears alongside Fast and Expert modes. It is DeepSeek's first multimodal move, built on V4's native multimodal architecture where image and video understanding were baked in during pre-training. There is still no V4 or Vision technical report and no stable API, so you can't integrate against it yet. The decision-relevant event was V4's price cut, not the vision toggle.
Local Qwen is a different tool, not a worse Opus
Alex Ellis argues the right frame for local models is task fit, not a benchmark ranking against frontier models. A local Qwen that runs offline on your own hardware wins on latency, privacy, and cost per token even when it loses on raw reasoning. If you are pushing high-volume, well-scoped tasks, route them to local before paying frontier rates. Save Opus for the calls that actually need it.
Lightricks releases LTX-2 with inference and LoRA training
LTX-2 ships an official Python package for inference and LoRA fine-tuning of an audio-video generative model. The LoRA trainer is the practical part: it lets you adapt the model to a specific style or subject without retraining from scratch. If you are building video generation features, this is a self-hostable base you can specialize cheaply.
Kilo Code passes model pricing straight through with zero markup
Kilo is an open-source coding agent for VS Code, JetBrains, and CLI that exposes 500+ models, including GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview, at the provider's own rate. The CLI is a fork of OpenCode, and a --auto flag runs fully autonomous in CI with all permission prompts off, intended only for trusted environments. Traction is self-reported and inconsistent across its own pages: 1.5M users and 25T tokens on GitHub versus 3M users and 40T tokens on the marketing site. The pass-through gateway is the real draw if you want to dodge agent-tool markups.
Plane offers an open-source alternative to Jira, Linear, and ClickUp
Plane is a self-hostable project management platform covering tasks, sprints, docs, and triage. The pitch is owning your roadmap data instead of renting it per seat. If your team is feeling Jira's price or wants project tracking inside your own infra, it is worth a trial.
gortex builds a local code graph that cuts agent token usage up to 50x
gortex is a code intelligence engine across 257 languages, exposed via CLI, MCP server, and API, built so AI coding agents pull only the code they need. The claim is up to 50x fewer tokens by serving precise graph slices instead of dumping files into context, and it runs 100% local. If your agent bills are dominated by context stuffing, a code graph in front of the model is the cheaper architecture.
Correction: roboflow/rf-detr is an object detector, not a video studio
The wire grafted an unrelated description onto this repo. RF-DETR is a real-time transformer for detection, segmentation, and keypoints on a DINOv2 backbone, accepted to ICLR 2026, spanning six sizes from 30.5M to 126.9M parameters. RF-DETR-N runs at 2.3ms per frame on a T4 and hits 60.1 AP on COCO at 2XL. Core models are Apache 2.0; ignore any "500+ agent skills" claim attached to this name.
10,000 GitHub repositories found distributing Trojan malware
A researcher documented roughly 10k repos seeded with Trojan payloads, the kind of supply-chain trap that snares developers cloning what looks like a useful tool. With agentic coding tools auto-pulling dependencies and running code, the attack surface is wider than ever. Pin sources, review before you run, and don't let an --auto agent execute untrusted repos.
Emacs 31 nears release with daily-driver-ready changes
A long-time user walks through the Emacs 31 features already worth using before the stable cut. Practical if you live in Emacs and want to know what changes your workflow versus what is cosmetic. Skip if you don't.
SteamOS Linux 3.8 ships as stable
Valve pushed SteamOS 3.8 to stable. Relevant if you build or test on the Steam Deck or care about the Linux gaming stack maturing. Otherwise a quiet point release.
Two stories point the same way: GLM-5.2 ships a 1M-context MIT model, and Kilo Code routes 500+ models at zero markup. The cost advantage in agentic coding is moving to open weights plus pass-through pricing. This week, run your repo-scale coding workload against a hosted GLM-5.2 endpoint and against your current closed default, then compare cost per completed task, not benchmark scores. If gortex-style code graphs cut your context tokens 10x, the open-weight path gets cheaper still.
By September 19, 2026, at least one top-five AI coding tool will set a Chinese open-weight model (GLM or DeepSeek) as a default or explicitly recommended option for long-context agentic tasks.
GLM-5.2 ships a 1M-context MIT model with DSA driving long-context serving cost down, DeepSeek's V4 cut prices hard, and tools like Kilo already expose these models at provider pricing with zero markup. Consensus still assumes US closed frontier models stay the default inside coding tools. The cost and license gap on long-context agentic work is now wide enough to flip that default.
By September 19, 2026, no top-five AI coding tool (by usage) sets or recommends a Chinese open-weight model as a default for long-context or agentic coding; all keep a US closed model as the recommended default.
Each new free or open visual-generation model drains another dollar of Firefly's pricing power, and Adobe's subscription COGS already grows faster than revenue as inference eats the margin. The multiple keeps compressing toward the only moat left, which is indemnity, not image quality.
Today stacks DeepSeek Vision inside a mass free chat app, LTX-2 open audio-video, roboflow's agentic video system, and Midjourney's medical push, all pressing the same commoditization the desk flagged on June 12. Adobe trades roughly 43% off its 2025 peak with a rare Goldman Sell, and subscription cost grew about 13% against 11% revenue as AI inference compresses gross margin. When generation goes free, commercially-safe training is a feature, not a premium moat.
The memory squeeze is now confirmed by the largest device buyer on earth, and Micron has already taken the move. The early phase of the memory long is over heading into the June 24 print, which is what changed since the desk's June 16 WATCH.
Cook calling Apple price hikes unavoidable is the real-economy confirmation of the DRAM and NAND squeeze the desk flagged, but MU is up roughly 820% in a year, sits near its $1,110 high, and roared up on the Apple headline itself. Options price about a 17.6% swing around June 24 earnings where consensus EPS sits near $20. Confirmation this loud and this late marks a crowded trade, not an early one.
Memory inflation is now a confirmed gross-margin headwind Apple will answer by raising prices into a soft consumer cycle, putting unit growth at risk even as the tape cheers the hikes as proof that hardware is underpriced. Timing on the demand hit is unclear, so this reads as a watch, not a short.
Apple guided fiscal Q2 gross margin to 48-49% with memory weighing more from here, and Bernstein pegs the hit near 1.5 points by year-end, while TechInsights estimates about $270 added to the next iPhone Pro. Cook's unavoidable language is an escalation from the January we'll-look-at-options framing. The bullish read treats pass-through as free and ignores elasticity if a roughly $1,299 iPhone Pro meets a stretched consumer.
The open-weight fallback the crowd prices into Alibaba is no longer obviously Qwen. GLM-5 shipping MIT and topping the agentic-coding boards moves the default open option toward Z.ai, which erodes the download-share narrative underpinning the Qwen-fallback read the desk has carried.
GLM-5 launched MIT-licensed and claims top spots on ArtificialAnalysis and LMArena Code, the exact long-horizon agentic workload where Qwen's download lead was the moat. DeepSeek adding vision the same day widens the Chinese open-weight field further. Alibaba's equity never monetized Qwen directly, so the self-host-Qwen thesis was always thin, and a better free competitor makes it thinner.