# Z.ai shipped GLM-5 under MIT and is already two releases past it

> Z.ai shipped GLM-5 under MIT and already pushed to a 1M-context GLM-5.2, reopening the default-model decision in every coding tool.

- Published: Friday, June 19, 2026 (2026-06-19)
- Publisher: nextbig.dev — daily AI & compute briefing, written by Oday Brahem with nextbig.dev's AI agent
- Sources analyzed: 34 articles from 300+ curated accounts
- Canonical URL: https://www.nextbig.dev/daily/2026-06-19

## The Big Story

### Z.ai shipped GLM-5 under MIT and is already two releases past it

Z.ai launched GLM-5, an MIT-licensed mixture-of-experts model aimed at long-horizon agentic coding, then pushed past it twice in days. GLM-5 scales to 744B parameters with 40B active, up from GLM-4.5's 355B/32B, and trains on 28.5T tokens versus 23T. It claims top spots on ArtificialAnalysis, LMArena Text, and LMArena Code. The flagship for engineering work, GLM-5.1, is a 754B MoE with a 200K window and SWE-Bench Pro leadership. The live SKU is GLM-5.2: a 1 million token context window, up to 131,072 output tokens, and two reasoning levels, available now to every GLM Coding Plan tier.

The lever underneath all of this is DeepSeek Sparse Attention plus an asynchronous RL stack called slime. DSA is what makes a 1M context window deployable instead of theoretical. It cuts the cost of attention over long sequences, which is the line item that normally makes million-token contexts unusable in production. Z.ai is not chasing single-turn benchmark wins. The target is multi-step agentic coding where a model reads a whole repo, holds it, and acts over many turns. That is exactly the workload where context length and serving cost decide whether a feature ships.

If you are building agentic coding or repo-scale RAG, this is worth testing this week. Weights are MIT on Hugging Face and ModelScope, so you can self-host or rent. A 744B/40B-active MoE serves far cheaper than its parameter count suggests because only 40B fire per token. Against closed frontier models charging premium rates for long context, an open-weight 1M-window model you control changes the math on any product pushing large prompts. The catch is the obvious one: 744B of weights needs real hardware, and the cheap path is a managed endpoint until you can justify the GPUs.

The signal for the next six months is the part the wire missed. The headline filed plain GLM-5 while Z.ai had already iterated to 5.1 and 5.2. Chinese open-weight labs are now setting the long-context price floor, and they are shipping faster than anyone can write the model-selection memo. DSA came from DeepSeek, slime is Z.ai's, and both land under permissive licenses. The squeeze falls on closed labs selling long context at a markup. When a 1M-window MIT model serves agentic coding at a fraction of the cost, the default-model decision inside every coding tool gets reopened.

Anyone choosing a model off the string GLM-5 is two releases behind. The number that matters is 1M, and it is already in production.

Source: @github — https://github.com/zai-org/GLM-5

## Compute & Infrastructure

### Browser-use runs Firecracker microVMs nested inside plain EC2

The trick is a VM inside a VM: AWS already runs your EC2 instance as a VM, and browser-use runs Firecracker browser VMs inside that. They picked regular EC2 over pricey .metal instances because hosts are faster to get and cheaper to keep, booting from a pre-built image and serving browsers about 30 seconds after launch, with the host reading the ready message over vsock in under a millisecond. The blog's real claim is roughly 3x cheaper and faster, not the sub-1s figure the headline implied. If you run cloud browsers at scale, this is the cost pattern to copy.

Source: @newsycombinator — https://browser-use.com/posts/firecracker-browser-infra

### Tim Cook says Apple prices rise as memory chip costs climb

Memory pricing is the tax nobody escapes. Cook flagged rising memory chip costs as a reason Apple prices go up, which is the consumer-facing echo of the DRAM and HBM crunch driving AI buildouts. If you are speccing GPU servers or edge devices in the next two quarters, budget for memory, not just compute. The squeeze hits anyone whose unit economics assume flat RAM costs.

Source: @newsycombinator — https://www.bbc.com/news/articles/c3wyxvqdx1zo

### Swiss parliament lifts its ban on new nuclear plants

Power is the binding constraint on datacenter expansion, and Switzerland just reopened a supply lever it had shut. Lifting the new-build ban won't add a megawatt this year, but it is another data point in the pattern of governments treating baseload as strategic again. Watch whether nuclear timelines start showing up in European datacenter siting decisions.

Source: @newsycombinator — https://www.bluewin.ch/en/news/switzerland/parliament-lifts-ban-on-new-nuclear-power-plants-3257535.html

### Ubiquiti ships an enterprise NAS built on ZFS

Ubiquiti is moving up the stack from networking into storage with a ZFS-based enterprise NAS. ZFS means snapshots, checksums, and data integrity as table stakes rather than add-ons. If you run on-prem storage for training data or model artifacts and want to stay off cloud egress bills, this is a cheaper integrated option worth pricing against Synology and TrueNAS.

Source: @newsycombinator — https://blog.ui.com/article/introducing-enterprise-nas

## AI & Models

### Midjourney's first hardware is a full-body ultrasound scanner, with no AI in it yet

The wire called this an AI imaging model. It is a 60-second full-body ultrasound device built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system, and Holz says "We're not even using any AI in this yet." Midjourney pays Butterfly a $15M one-time fee plus $10M annually over five years. It is not FDA-cleared, about a dozen people have been scanned, and the company admits it still hasn't solved turning noisy waves into static images. Treat it as a hardware bet, not a diffusion-model story.

Source: @newsycombinator — https://www.midjourney.com/medical/blogpost

### DeepSeek Vision is a seven-week-old beta, not a fresh launch

The HN post resurfaced an event from April 29: a limited vision-mode beta that appears alongside Fast and Expert modes. It is DeepSeek's first multimodal move, built on V4's native multimodal architecture where image and video understanding were baked in during pre-training. There is still no V4 or Vision technical report and no stable API, so you can't integrate against it yet. The decision-relevant event was V4's price cut, not the vision toggle.

Source: @newsycombinator — https://chat.deepseek.com/

### Local Qwen is a different tool, not a worse Opus

Alex Ellis argues the right frame for local models is task fit, not a benchmark ranking against frontier models. A local Qwen that runs offline on your own hardware wins on latency, privacy, and cost per token even when it loses on raw reasoning. If you are pushing high-volume, well-scoped tasks, route them to local before paying frontier rates. Save Opus for the calls that actually need it.

Source: @newsycombinator — https://blog.alexellis.io/local-ai-is-not-opus/

### Lightricks releases LTX-2 with inference and LoRA training

LTX-2 ships an official Python package for inference and LoRA fine-tuning of an audio-video generative model. The LoRA trainer is the practical part: it lets you adapt the model to a specific style or subject without retraining from scratch. If you are building video generation features, this is a self-hostable base you can specialize cheaply.

Source: @github — https://github.com/Lightricks/LTX-2

## Developer Tools

### Kilo Code passes model pricing straight through with zero markup

Kilo is an open-source coding agent for VS Code, JetBrains, and CLI that exposes 500+ models, including GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview, at the provider's own rate. The CLI is a fork of OpenCode, and a --auto flag runs fully autonomous in CI with all permission prompts off, intended only for trusted environments. Traction is self-reported and inconsistent across its own pages: 1.5M users and 25T tokens on GitHub versus 3M users and 40T tokens on the marketing site. The pass-through gateway is the real draw if you want to dodge agent-tool markups.

Source: @github — https://github.com/Kilo-Org/kilocode

### Plane offers an open-source alternative to Jira, Linear, and ClickUp

Plane is a self-hostable project management platform covering tasks, sprints, docs, and triage. The pitch is owning your roadmap data instead of renting it per seat. If your team is feeling Jira's price or wants project tracking inside your own infra, it is worth a trial.

Source: @github — https://github.com/makeplane/plane

### gortex builds a local code graph that cuts agent token usage up to 50x

gortex is a code intelligence engine across 257 languages, exposed via CLI, MCP server, and API, built so AI coding agents pull only the code they need. The claim is up to 50x fewer tokens by serving precise graph slices instead of dumping files into context, and it runs 100% local. If your agent bills are dominated by context stuffing, a code graph in front of the model is the cheaper architecture.

Source: @github — https://github.com/zzet/gortex

### Correction: roboflow/rf-detr is an object detector, not a video studio

The wire grafted an unrelated description onto this repo. RF-DETR is a real-time transformer for detection, segmentation, and keypoints on a DINOv2 backbone, accepted to ICLR 2026, spanning six sizes from 30.5M to 126.9M parameters. RF-DETR-N runs at 2.3ms per frame on a T4 and hits 60.1 AP on COCO at 2XL. Core models are Apache 2.0; ignore any "500+ agent skills" claim attached to this name.

Source: @github — https://github.com/roboflow/rf-detr

## Security

### 10,000 GitHub repositories found distributing Trojan malware

A researcher documented roughly 10k repos seeded with Trojan payloads, the kind of supply-chain trap that snares developers cloning what looks like a useful tool. With agentic coding tools auto-pulling dependencies and running code, the attack surface is wider than ever. Pin sources, review before you run, and don't let an --auto agent execute untrusted repos.

Source: @newsycombinator — https://orchidfiles.com/github-repositories-distributing-malware/

## Launches & Releases

### Emacs 31 nears release with daily-driver-ready changes

A long-time user walks through the Emacs 31 features already worth using before the stable cut. Practical if you live in Emacs and want to know what changes your workflow versus what is cosmetic. Skip if you don't.

Source: @newsycombinator — https://www.rahuljuliato.com/posts/emacs-31-around-the-corner

### SteamOS Linux 3.8 ships as stable

Valve pushed SteamOS 3.8 to stable. Relevant if you build or test on the Steam Deck or care about the Linux gaming stack maturing. Otherwise a quiet point release.

Source: @newsycombinator — https://store.steampowered.com/news/app/1675200/view/697641379212298072

## Quick Hits

- Microsoft's new Outlook takes 10 seconds to do what Classic does instantly (@newsycombinator) — https://www.windowslatest.com/2026/06/15/microsofts-new-outlook-takes-10-seconds-to-do-what-outlook-classic-does-instantly-on-windows/
- .gitignore isn't the only way to ignore files: try .git/info/exclude and skip-worktree (@newsycombinator) — https://nelson.cloud/.gitignore-isnt-the-only-way-to-ignore-files-in-git/
- Cornell's CS6120 advanced compilers course is free and self-guided (@newsycombinator) — https://www.cs.cornell.edu/courses/cs6120/2025fa/self-guided/
- Glojure runs Clojure hosted on Go (@newsycombinator) — https://github.com/glojurelang/glojure
- Kong's Insomnia covers GraphQL, REST, WebSockets, SSE, and gRPC in one open-source client (@github) — https://github.com/Kong/insomnia
- asynq is a distributed task queue in Go for reliable background jobs (@github) — https://github.com/hibiken/asynq
- Hospitals and universities are repurposing drugs at 90% lower cost (@newsycombinator) — https://www.kcl.ac.uk/news/hospitals-and-universities-repurposing-drugs-at-90-lower-cost
- Modos color monitor pushes e-paper displays further (@newsycombinator) — https://spectrum.ieee.org/modos-e-paper-monitor

## The Takeaway

Two stories point the same way: GLM-5.2 ships a 1M-context MIT model, and Kilo Code routes 500+ models at zero markup. The cost advantage in agentic coding is moving to open weights plus pass-through pricing. This week, run your repo-scale coding workload against a hosted GLM-5.2 endpoint and against your current closed default, then compare cost per completed task, not benchmark scores. If gortex-style code graphs cut your context tokens 10x, the open-weight path gets cheaper still.

## The Call

By September 19, 2026, at least one top-five AI coding tool will set a Chinese open-weight model (GLM or DeepSeek) as a default or explicitly recommended option for long-context agentic tasks.

The case: GLM-5.2 ships a 1M-context MIT model with DSA driving long-context serving cost down, DeepSeek's V4 cut prices hard, and tools like Kilo already expose these models at provider pricing with zero markup. Consensus still assumes US closed frontier models stay the default inside coding tools. The cost and license gap on long-context agentic work is now wide enough to flip that default.

What proves us wrong: By September 19, 2026, no top-five AI coding tool (by usage) sets or recommends a Chinese open-weight model as a default for long-context or agentic coding; all keep a US closed model as the recommended default.

Settles: by September 19, 2026

## The Tape

The market desk's signals from the day's verified wire. Falsifiable analysis, settled in public — not individualized investment advice.

### SHORT ADBE (Adobe) — medium conviction

Each new free or open visual-generation model drains another dollar of Firefly's pricing power, and Adobe's subscription COGS already grows faster than revenue as inference eats the margin. The multiple keeps compressing toward the only moat left, which is indemnity, not image quality.

The mechanism: Today stacks DeepSeek Vision inside a mass free chat app, LTX-2 open audio-video, roboflow's agentic video system, and Midjourney's medical push, all pressing the same commoditization the desk flagged on June 12. Adobe trades roughly 43% off its 2025 peak with a rare Goldman Sell, and subscription cost grew about 13% against 11% revenue as AI inference compresses gross margin. When generation goes free, commercially-safe training is a feature, not a premium moat.

Wrong if: Adobe's next print, fiscal Q3 FY26 in September, shows Digital Media net-new ARR reaccelerating and subscription gross margin stabilizing year over year.

Settles: September 2026

### WATCH MU (Micron Technology) — medium conviction

The memory squeeze is now confirmed by the largest device buyer on earth, and Micron has already taken the move. The early phase of the memory long is over heading into the June 24 print, which is what changed since the desk's June 16 WATCH.

The mechanism: Cook calling Apple price hikes unavoidable is the real-economy confirmation of the DRAM and NAND squeeze the desk flagged, but MU is up roughly 820% in a year, sits near its $1,110 high, and roared up on the Apple headline itself. Options price about a 17.6% swing around June 24 earnings where consensus EPS sits near $20. Confirmation this loud and this late marks a crowded trade, not an early one.

Wrong if: MU prints a beat-and-raise on June 24, 2026 and closes at a fresh all-time high above $1,110 in the days after, showing the memory long still had room.

Settles: July 2026

### WATCH AAPL (Apple) — low conviction

Memory inflation is now a confirmed gross-margin headwind Apple will answer by raising prices into a soft consumer cycle, putting unit growth at risk even as the tape cheers the hikes as proof that hardware is underpriced. Timing on the demand hit is unclear, so this reads as a watch, not a short.

The mechanism: Apple guided fiscal Q2 gross margin to 48-49% with memory weighing more from here, and Bernstein pegs the hit near 1.5 points by year-end, while TechInsights estimates about $270 added to the next iPhone Pro. Cook's unavoidable language is an escalation from the January we'll-look-at-options framing. The bullish read treats pass-through as free and ignores elasticity if a roughly $1,299 iPhone Pro meets a stretched consumer.

Wrong if: Apple's fiscal Q3 2026 report in late July guides gross margin at or above 48% with no unit-demand warning attached to memory pricing.

Settles: August 2026

### WATCH BABA (Alibaba) — low conviction

The open-weight fallback the crowd prices into Alibaba is no longer obviously Qwen. GLM-5 shipping MIT and topping the agentic-coding boards moves the default open option toward Z.ai, which erodes the download-share narrative underpinning the Qwen-fallback read the desk has carried.

The mechanism: GLM-5 launched MIT-licensed and claims top spots on ArtificialAnalysis and LMArena Code, the exact long-horizon agentic workload where Qwen's download lead was the moat. DeepSeek adding vision the same day widens the Chinese open-weight field further. Alibaba's equity never monetized Qwen directly, so the self-host-Qwen thesis was always thin, and a better free competitor makes it thinner.

Wrong if: Qwen holds above 50% of open-weight model downloads on Hugging Face through Q3 2026, or a top-five coding tool sets Qwen rather than GLM or DeepSeek as its recommended open default.

Settles: September 2026

---
Cite as: "nextbig.dev Daily AI Briefing, 2026-06-19" — https://www.nextbig.dev/daily/2026-06-19