WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-05-30

1
00:00:00.000 --> 00:00:01.917
<v Alex>Hey everyone, welcome to the Builder's Briefing for May 30th, 2026. I'm Alex.

2
00:00:01.917 --> 00:00:06.150
<v Sam>And I'm Sam. Good lineup today — self-hosted inference hitting some wild numbers, a mystery model nobody can explain, and GitHub banning a security researcher. Fun times.

3
00:00:06.150 --> 00:00:12.176
<v Alex>Let's jump right into the big story. Kog.ai published benchmarks showing three thousand tokens per second per request — and here's the kicker — on commodity GPUs. Not H100 clusters, not custom silicon. Standard hardware you can actually rent.

4
00:00:12.176 --> 00:00:20.120
<v Sam>Okay, that's a genuinely big deal. Three thousand tokens per second on standard GPUs means streaming responses that feel basically instantaneous. If you're running high-throughput workloads — code generation, document processing, agent loops — the math on self-hosting versus paying per-token just changed dramatically.

5
00:00:20.120 --> 00:00:26.419
<v Alex>Exactly. Their optimization stack combines speculative decoding, quantization, and kernel-level tweaks. They detail the whole approach in their blog — link in the briefing — so you can check whether it works for your model size and latency requirements.

6
00:00:26.419 --> 00:00:32.769
<v Sam>Right, and what's wild is the timing. You've got this dropping alongside a mystery model topping OpenRouter and Mistral announcing new stuff at their summit. The inference layer is suddenly very competitive, and it all favors builders who own their stack.

7
00:00:32.769 --> 00:00:37.077
<v Alex>Which is a perfect segue. Let's talk about that mystery model. Something called Hy3 is dominating the OpenRouter rankings by a wide margin, and nobody knows who's behind it.

8
00:00:37.077 --> 00:00:43.899
<v Sam>That's fascinating and slightly unsettling? Like, if you're routing through OpenRouter you should absolutely test it, but treat it as a total black box. No provenance, no idea about long-term availability. You don't want to build a dependency on something that might vanish.

9
00:00:43.899 --> 00:00:54.930
<v Alex>Agreed. Meanwhile, Mistral held their Now Summit in Paris with new model releases and API updates. If you're evaluating European-hosted alternatives for data residency or regulatory reasons, worth checking what shipped. And there's a new reproducible world model platform from Galilai Group — if you're doing anything with learned simulators, robotics, or video prediction, it gives you standardized baselines instead of reimplementing papers.

10
00:00:54.930 --> 00:01:03.147
<v Sam>Oh, I also want to flag two essays from this section. One on which human skills still matter as models get better, and one from Vicki Boykis arguing we should focus on the unglamorous parts AI can't handle — data quality, evaluation, understanding failure modes. Both are great gut-checks if you're shipping AI features right now.

11
00:01:03.147 --> 00:01:08.103
<v Alex>Alright, dev tools. The one that caught my eye is DBOS making the case that Postgres is all you need for durable workflows. No Temporal, no Inngest, just build durable execution directly on Postgres.

12
00:01:08.103 --> 00:01:15.050
<v Sam>Three hundred plus Hacker News points on that one, so clearly it's resonating. If you're already Postgres-native and you're tired of bolting on a separate orchestration layer just for workflow state, this is genuinely worth evaluating. I love tools that eliminate infrastructure.

13
00:01:15.050 --> 00:01:20.229
<v Alex>Also from Ink and Switch, the local-first folks dropped Bijou64 — a new variable-length integer encoding. Niche but super relevant if you're building CRDTs or sync protocols where compact wire formats matter.

14
00:01:20.229 --> 00:01:25.483
<v Sam>And a quick heads-up — Garnix, the Nix CI service, is shutting down. If you're using it, start migrating now. Hercules CI or self-hosted are your main options. The Nix CI ecosystem remains... fragile, let's say.

15
00:01:25.483 --> 00:01:30.538
<v Alex>Okay, security. This one's spicy. GitHub banned a security researcher who posted Windows zero-day exploits. The researcher claims it's vindictive retaliation from Microsoft — who, of course, owns GitHub.

16
00:01:30.538 --> 00:01:37.784
<v Sam>That's interesting because it highlights a real platform risk that a lot of people don't think about. The company that owns the platform where you host your security tooling is also the company whose software you might be finding vulnerabilities in. That's a structural conflict of interest.

17
00:01:37.784 --> 00:01:42.366
<v Alex>If you host proof-of-concept code or security tooling on GitHub, the takeaway is simple — mirror your critical repos elsewhere. Don't let a single platform decision wipe out your work.

18
00:01:42.366 --> 00:01:48.665
<v Sam>Also in health-tech news, a company called Headway is requiring biometric face scans for therapy patients to keep getting care. If you're building in health-tech, that's a cautionary tale about friction and privacy liability with biometric requirements.

19
00:01:48.665 --> 00:01:54.019
<v Alex>Let's hit startups and launches quickly. Raspberry Pi 6 details are coming — Jeff Geerling has the rundown, link in the briefing. If you're doing edge AI or embedded work, start checking toolchain compatibility now.

20
00:01:54.019 --> 00:02:00.916
<v Sam>And there's a great story about a guy named Nick Winans who built a million-dollar hardware product — a wireless keyboard microcontroller called nice!nano — solo from his dorm room. Open-source community, real pain point, direct sales. It's a clean playbook for niche hardware.

21
00:02:00.916 --> 00:02:06.096
<v Alex>Quick hits — Blue Origin's New Glenn rocket exploded during a static fire test. GTA 6 developers at Rockstar unionized. And there's a sobering piece on just how much data modern cars are collecting about you.

22
00:02:06.096 --> 00:02:09.955
<v Sam>Also someone wrote a very detailed nitpick of the shell history scene in Tron: Legacy, which — honestly — is the kind of content the internet was made for.

23
00:02:09.955 --> 00:02:18.297
<v Alex>Ha! Alright, here's the takeaway for today. The self-hosted inference story is the one to act on. Three thousand tokens per second on commodity GPUs, a mystery model topping the charts, Mistral pushing new capabilities — the cost of being locked into a single LLM provider is going up while the cost of running your own is coming down.

24
00:02:18.297 --> 00:02:25.817
<v Sam>So if you're building AI features with any meaningful token volume, invest in an abstraction layer now. Whether that's LiteLLM, a simple router, or your own gateway — the builders who can swap models and providers without rewriting their app are going to have a real structural advantage in six months.

25
00:02:25.817 --> 00:02:28.904
<v Alex>That's the briefing for May 30th. Links to everything we talked about are in the show notes. Thanks for listening, everyone.

26
00:02:28.904 --> 00:02:30.000
<v Sam>Go build something. We'll see you next time.