WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-03-23

1
00:00:00.000 --> 00:00:04.749
<v Marcus>Hey everyone, welcome to the Builder's Briefing for March twenty-third, twenty twenty-six. I'm Alex, joined as always by Sam, and today — honestly, today's news has a theme that's hard to miss.

2
00:00:04.749 --> 00:00:08.244
<v Nadia>Yeah, let me guess — AI is leaving the cloud? Because when I saw the top stories this morning, the through-line was basically screaming at me.

3
00:00:08.244 --> 00:00:15.356
<v Marcus>Exactly right. So let's get into the big story. Flash-MoE dropped on GitHub and blew up on Hacker News — over two hundred points of discussion. It's a technique that lets you run a three hundred and ninety-seven billion parameter Mixture-of-Experts model on consumer hardware. On a laptop.

4
00:00:15.356 --> 00:00:22.295
<v Nadia>Okay, and to be clear, this isn't some heavily quantized, stripped-down version of a model. The trick is sparse activation — it only loads the expert slices you actually need for each token, so the memory footprint stays within laptop-class VRAM. That's a genuinely clever approach.

5
00:00:22.295 --> 00:00:30.908
<v Marcus>Right, and what's wild is this isn't happening in isolation. The same week, tinygrad's Tinybox hit over four hundred points on Hacker News — they're shipping actual dedicated hardware for offline inference, handling a hundred and twenty billion parameters. And then there's Project Nomad building offline-first knowledge systems. It's all converging.

6
00:00:30.908 --> 00:00:36.667
<v Nadia>So if you're a builder, the takeaway is pretty immediate. You can now prototype against near-frontier-class models locally before you decide what actually needs cloud scale. Your cost calculus for inference just fundamentally changed.

7
00:00:36.667 --> 00:00:43.040
<v Marcus>And think about the use cases — on-device assistants, offline-capable tools, anything privacy-sensitive. You're no longer choosing between capability and latency. If your architecture assumes every inference call hits an API, start abstracting that layer now.

8
00:00:43.040 --> 00:00:47.864
<v Nadia>Absolutely. Treat local inference as a first-class deployment target, not a nice-to-have. The builders who do that are going to own the next wave in privacy-sensitive and cost-constrained markets.

9
00:00:47.864 --> 00:00:54.631
<v Marcus>Alright, staying in AI and models — LightRAG just got accepted at EMNLP twenty twenty-five, and it's been racking up over two thousand engagement across its repos. It's a graph-enhanced RAG framework that's genuinely simpler to deploy than most of the alternatives out there.

10
00:00:54.631 --> 00:01:01.792
<v Nadia>That's interesting because I've seen so many teams still hand-rolling their retrieval pipelines, and it's painful. There's even a Chinese financial trading agent fork of LightRAG, which tells you it's production-ready for domain-specific stuff. Worth benchmarking if you're doing RAG at all.

11
00:01:01.792 --> 00:01:08.904
<v Marcus>Also worth flagging — there's a new structured course for building production-grade agentic RAG systems across Claude Code, Codex, Opencode, and Cursor. If you've been past the demo stage and hitting real issues with agent memory and security boundaries, link in the briefing for that one.

12
00:01:08.904 --> 00:01:12.620
<v Nadia>Oh, that's the kind of reference material that's been missing. Everyone knows how to build a demo agent — it's the production hardening that kills you.

13
00:01:12.620 --> 00:01:17.492
<v Marcus>Switching to dev tools — Television, a Rust-based terminal fuzzy finder, pulled over eleven hundred engagement this week. That tells you how much developers care about speed in their daily workflow.

14
00:01:17.492 --> 00:01:22.119
<v Nadia>I mean, if fzf feels slow to you in large repos, that's saying something. Television has this extensible channel-based filtering model that looks really nice. Definitely grabbing that one.

15
00:01:22.119 --> 00:01:27.163
<v Marcus>And here's one I found fascinating — Bram Cohen, the creator of BitTorrent, outlined his vision for next-gen version control called Mañana. The core idea is handling AI-generated code better than git does.

16
00:01:27.163 --> 00:01:34.374
<v Nadia>Okay, that's a conversation I've been waiting for someone credible to start. Because if you've worked with AI coding agents on any real codebase, you know git's merge model is going to break under that pressure. Multiple agents generating code in parallel, massive diffs — it's a real problem.

17
00:01:34.374 --> 00:01:40.403
<v Marcus>Also quick shout-out to OpenWork — it's an open-source, self-hostable alternative to Claude's Cowork collaboration features, built on opencode. If you don't want to lock your team's workflow into Anthropic's platform, that's your starting point.

18
00:01:40.403 --> 00:01:46.752
<v Marcus>Alright, security — and this one's urgent. The Trivy container security scanner had its supply chain briefly compromised. If Trivy is in your CI/CD pipeline, and it's in a lot of them, you need to review the advisory and pin to verified versions immediately.

19
00:01:46.752 --> 00:01:52.362
<v Nadia>This is the recurring nightmare, right? The security tooling itself becomes the high-value target. It's like — who watches the watchers? If you're running Trivy in CI, stop what you're doing and check this. Link in the briefing.

20
00:01:52.362 --> 00:01:58.982
<v Marcus>Also on the infrastructure side — Cloudflare's family-safe DNS is now blocking archive.today, flagging it as botnet command-and-control. If you rely on archive.today for link preservation in your docs or products, check whether your users are on filtered DNS resolvers.

21
00:01:58.982 --> 00:02:02.870
<v Nadia>That's a weird one. Archive.today is a legitimate archival service, so that flag feels aggressive. Definitely something to monitor if you're in that workflow.

22
00:02:02.870 --> 00:02:07.620
<v Marcus>New launches — Tooscut is a browser-based video editor hitting near-native performance using WebGPU and WASM. This is a real proof point that complex creative tools no longer need desktop apps.

23
00:02:07.620 --> 00:02:12.738
<v Nadia>WebGPU plus WASM is just quietly becoming the stack for serious browser applications. If you're building any kind of media processing features, this tells you the platform is mature enough for production now.

24
00:02:12.738 --> 00:02:19.678
<v Marcus>Quick hits to round us out — there's a great practical primer on Bayesian statistics for data scientists, a walkthrough of submitting your first Linux kernel patch, and a spicy sixty-eight-comment debate about why currying might be overrated. Links for all of those in the briefing.

25
00:02:19.678 --> 00:02:25.166
<v Nadia>Oh, the currying debate — I can already feel the functional programming people warming up their keyboards. Also, the one about common system architecture diagram mistakes is genuinely useful if you're doing any design docs.

26
00:02:25.166 --> 00:02:31.588
<v Marcus>So here's the big takeaway for today. Serious AI inference is leaving the cloud. Flash-MoE on a laptop, Tinybox shipping hardware, Project Nomad going offline-first — the stack for AI products that work without an internet connection is materializing right now.

27
00:02:31.588 --> 00:02:37.494
<v Nadia>And the action item is clear — abstract your inference layer so you can swap between cloud and local without rewriting your app. Whether it's privacy, latency, or cost driving the decision, you want that flexibility baked in from the start.

28
00:02:37.494 --> 00:02:43.302
<v Marcus>That's the briefing for March twenty-third. If you're building AI-powered anything, the next six months are going to reward people who treat local inference as a real deployment target. We'll see you tomorrow — go build something great.

29
00:02:43.302 --> 00:02:44.1000
<v Nadia>See you tomorrow, folks. And seriously, go check your Trivy versions.