WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-06-19

1
00:00:04.500 --> 00:00:12.660
<v Oday>Z.ai shipped GLM-5 under an MIT license this week, and by the time the wire wrote it down, the lab was already two releases past it.

2
00:00:12.660 --> 00:00:17.180
<v Shannon>It's Friday, June 19, 2026. Here's the rundown.

3
00:00:17.180 --> 00:00:28.740
<v Shannon>One open-weight model rewriting the long-context math, a compute beat that runs from microVMs to nuclear, and a call about whose model sits inside your coding tool by September.

4
00:00:28.920 --> 00:00:45.000
<v Oday>Start with the numbers. GLM-5 is a mixture-of-experts model, 744 billion parameters, 40 billion active, trained on 28.5 trillion tokens. It claims top spots on ArtificialAnalysis and both LMArena boards.

5
00:00:45.000 --> 00:01:01.960
<v Oday>Then they pushed past it twice in days. The live SKU is GLM-5.2: a one million token context window, up to 131,072 tokens of output, two reasoning levels, on every Coding Plan tier right now.

6
00:01:01.960 --> 00:01:10.840
<v Shannon>And the headline that filed plain GLM-5 is already wrong. Anyone picking a model off that string is two versions behind.

7
00:01:10.840 --> 00:01:20.360
<v Oday>The lever underneath it is DeepSeek Sparse Attention plus an async RL stack they call slime. What does DSA actually buy you here?

8
00:01:20.360 --> 00:01:35.160
<v Shannon>It's what turns a million-token window from a slide into something you can serve. Attention over long sequences is the line item that normally makes million-token contexts unusable in production. DSA cuts that cost.

9
00:01:35.160 --> 00:01:44.600
<v Shannon>That's why this isn't a single-turn benchmark flex. The target is agentic coding. A model reads a whole repo, holds it, and acts over many turns.

10
00:01:44.600 --> 00:01:48.840
<v Oday>And the 744 billion parameters sound like a hardware wall.

11
00:01:48.840 --> 00:01:59.400
<v Shannon>Only 40 billion fire per token, so it serves far cheaper than the parameter count suggests. No marketing tax on that one. It's how MoE works.

12
00:01:59.400 --> 00:02:08.280
<v Shannon>The catch is the obvious one. Those weights still need real GPUs. The cheap path is a managed endpoint until you can justify owning the hardware.

13
00:02:08.280 --> 00:02:15.400
<v Oday>So if you're building agentic coding or repo-scale retrieval this week, the move is to test it.

14
00:02:15.400 --> 00:02:28.840
<v Shannon>Test it against your current closed default, and compare cost per completed task, not benchmark scores. An open-weight one-million-window model you control changes the math on any product pushing large prompts.

15
00:02:29.450 --> 00:02:42.410
<v Oday>The piece the wire missed is the cadence. DSA came from DeepSeek, slime is Z.ai's, both under permissive licenses, both shipping faster than anyone can write the model-selection memo.

16
00:02:42.410 --> 00:02:52.810
<v Shannon>That's the real story. Chinese open-weight labs are setting the long-context price floor now. The squeeze lands on closed labs charging a markup for the same window.

17
00:02:52.940 --> 00:03:08.060
<v Oday>Browser-use is running Firecracker microVMs nested inside plain EC2. A VM inside a VM, booting browsers about 30 seconds after launch, with the host reading the ready signal over vsock in under a millisecond.

18
00:03:08.060 --> 00:03:23.260
<v Shannon>And they picked regular EC2 over the pricey metal instances on purpose, because hosts are faster to get and cheaper to keep. The honest claim is roughly three times cheaper and faster, not the sub-second figure the headline implied.

19
00:03:23.260 --> 00:03:27.660
<v Shannon>If you run cloud browsers at scale, that's the cost pattern to copy.

20
00:03:27.660 --> 00:03:38.780
<v Oday>Tim Cook says Apple prices go up because memory chip costs are climbing. That's the consumer echo of the DRAM and HBM crunch driving the AI buildout.

21
00:03:38.780 --> 00:03:52.060
<v Shannon>Memory is the tax nobody escapes. If you're speccing GPU servers or edge devices in the next two quarters, budget for RAM, not just compute. Flat memory cost is no longer a safe assumption.

22
00:03:52.060 --> 00:03:59.260
<v Oday>Switzerland's parliament lifted its ban on new nuclear plants. Won't add a megawatt this year.

23
00:03:59.260 --> 00:04:10.700
<v Shannon>No, but it's another government treating baseload as strategic again. Power is the binding constraint on datacenter expansion. Watch whether nuclear timelines start showing up in European siting decisions.

24
00:04:10.700 --> 00:04:19.340
<v Oday>And Ubiquiti shipped an enterprise NAS built on ZFS. Snapshots, checksums, data integrity as table stakes.

25
00:04:19.340 --> 00:04:28.860
<v Shannon>If you keep training data or model artifacts on-prem to dodge cloud egress, price it against Synology and TrueNAS. The integration might win.

26
00:04:29.040 --> 00:04:38.320
<v Oday>Midjourney's first hardware is a full-body ultrasound scanner. 60 seconds, built on Butterfly Network's ultrasound-on-chip silicon, 40 modules per system.

27
00:04:38.320 --> 00:04:51.200
<v Shannon>And the founder says, quote, we're not even using any AI in this yet. It's not FDA-cleared, about a dozen people have been scanned, and they admit they haven't solved turning noisy waves into images.

28
00:04:51.200 --> 00:05:01.600
<v Shannon>Treat it as a hardware bet, not a diffusion-model story. Midjourney's paying Butterfly 15 million up front plus 10 million a year for five years to find out.

29
00:05:01.600 --> 00:05:07.280
<v Oday>DeepSeek Vision got resurfaced as a launch. It's a beta from April 29.

30
00:05:07.280 --> 00:05:20.120
<v Shannon>Seven weeks old, no V4 technical report, no stable API. You can't integrate against it. The decision-relevant event was V4's price cut, not the vision toggle.

31
00:05:20.120 --> 00:05:25.440
<v Oday>Alex Ellis makes the case that local Qwen isn't a worse Opus, it's a different tool.

32
00:05:25.440 --> 00:05:38.400
<v Shannon>Right frame. Local wins on latency, privacy, and cost per token even when it loses on raw reasoning. Route your high-volume, well-scoped tasks to local, and save Opus for the calls that need it.

33
00:05:38.400 --> 00:05:45.760
<v Oday>And Lightricks released LTX-2 with an official package for inference and LoRA training on an audio-video model.

34
00:05:45.760 --> 00:05:56.000
<v Shannon>The LoRA trainer is the useful part. Adapt it to a style or subject without retraining from scratch. Self-hostable base if you're building video features.

35
00:05:56.980 --> 00:06:11.220
<v Oday>Kilo Code passes model pricing straight through, zero markup. Open-source agent for VS Code, JetBrains, and CLI, 500-plus models at the provider's own rate, including GPT-5.5 and Claude Opus 4.7.

36
00:06:11.220 --> 00:06:24.100
<v Shannon>The pass-through gateway is the draw. Though the traction numbers fight each other, 1.5 million users on GitHub, 3 million on the marketing site. Pick whichever the marketing team needed that day.

37
00:06:24.100 --> 00:06:29.620
<v Oday>There's also an auto flag that runs fully autonomous in CI with every permission prompt off.

38
00:06:29.620 --> 00:06:36.180
<v Shannon>Trusted environments only. Point that at an untrusted repo and you'll learn something expensive.

39
00:06:36.180 --> 00:06:44.740
<v Oday>gortex builds a local code graph that claims up to 50 times fewer tokens, across 257 languages, fully local.

40
00:06:44.740 --> 00:07:00.260
<v Shannon>It serves precise graph slices instead of dumping files into context. If your agent bills are dominated by context stuffing, a graph in front of the model is the cheaper architecture. Plane is also worth a look if Jira's per-seat price is grinding you down.

41
00:07:00.260 --> 00:07:15.540
<v Oday>And a correction. The wire called roboflow's RF-DETR a video studio. It's a real-time detection transformer on a DINOv2 backbone, accepted to ICLR 2026, 2.3 milliseconds per frame on a T4.

42
00:07:15.540 --> 00:07:22.740
<v Shannon>Apache 2.0, six sizes. Ignore any 500-skills claim glued to that name.

43
00:07:24.440 --> 00:07:34.040
<v Oday>A researcher documented roughly 10,000 GitHub repositories seeded with Trojan payloads. The classic trap for a developer cloning what looks like a useful tool.

44
00:07:34.040 --> 00:07:49.160
<v Shannon>And it's worse now because agentic tools auto-pull dependencies and run code. Pin your sources, review before you run, and don't let an auto agent execute untrusted repos. That's the whole attack in one sentence.

45
00:07:49.340 --> 00:07:55.980
<v Oday>Emacs 31 is nearing release, and a long-time user walks through what's already worth using before the stable cut.

46
00:07:55.980 --> 00:08:07.100
<v Shannon>Useful if you live in Emacs, skippable if you don't. And SteamOS 3.8 shipped stable. A quiet point release unless you build or test on the Steam Deck.

47
00:08:10.340 --> 00:08:12.180
<v Oday>Quick break — two from the desk.

48
00:08:12.180 --> 00:08:26.500
<v Shannon>One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.

49
00:08:26.500 --> 00:08:37.860
<v Oday>And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.

50
00:08:42.210 --> 00:08:47.090
<v Oday>Microsoft's new Outlook takes 10 seconds to do what Classic does instantly.

51
00:08:47.090 --> 00:08:56.370
<v Shannon>For ignoring files, .gitignore isn't your only option, try .git/info/exclude and skip-worktree.

52
00:08:56.370 --> 00:09:02.450
<v Oday>Cornell's CS6120 advanced compilers course is free and self-guided.

53
00:09:02.450 --> 00:09:14.050
<v Shannon>Glojure runs Clojure hosted on Go, and Kong's Insomnia now covers GraphQL, REST, WebSockets, SSE, and gRPC in one open-source client.

54
00:09:14.050 --> 00:09:25.410
<v Oday>asynq is a distributed task queue in Go for reliable background jobs, and hospitals and universities are repurposing drugs at 90 percent lower cost.

55
00:09:25.590 --> 00:09:39.910
<v Oday>Our call: by September 19, at least one top-five AI coding tool sets or recommends a Chinese open-weight model, GLM or DeepSeek, as a default for long-context agentic work.

56
00:09:39.910 --> 00:09:48.790
<v Shannon>We're wrong if every top-five tool by usage still keeps a US closed model as the recommended default on that date. Settles September 19.