3,000 tokens/s on standard GPUs:

The Rundown No. 100 · Audio Edition · 3 min All episodes RSS MP3

0:00 / 2:30

VTT

Marcus

Hey everyone, welcome to the Builder's Briefing for May 30th, 2026. I'm Alex.

Nadia

And I'm Sam. Good lineup today — self-hosted inference hitting some wild numbers, a mystery model nobody can explain, and GitHub banning a security researcher. Fun times.

Marcus

Let's jump right into the big story. Kog.ai published benchmarks showing three thousand tokens per second per request — and here's the kicker — on commodity GPUs. Not H100 clusters, not custom silicon. Standard hardware you can actually rent.

Nadia

Okay, that's a genuinely big deal. Three thousand tokens per second on standard GPUs means streaming responses that feel basically instantaneous. If you're running high-throughput workloads — code generation, document processing, agent loops — the math on self-hosting versus paying per-token just changed dramatically.

Marcus

Exactly. Their optimization stack combines speculative decoding, quantization, and kernel-level tweaks. They detail the whole approach in their blog — link in the briefing — so you can check whether it works for your model size and latency requirements.

Nadia

Right, and what's wild is the timing. You've got this dropping alongside a mystery model topping OpenRouter and Mistral announcing new stuff at their summit. The inference layer is suddenly very competitive, and it all favors builders who own their stack.

Marcus

Which is a perfect segue. Let's talk about that mystery model. Something called Hy3 is dominating the OpenRouter rankings by a wide margin, and nobody knows who's behind it.

Nadia

That's fascinating and slightly unsettling? Like, if you're routing through OpenRouter you should absolutely test it, but treat it as a total black box. No provenance, no idea about long-term availability. You don't want to build a dependency on something that might vanish.

Marcus

Agreed. Meanwhile, Mistral held their Now Summit in Paris with new model releases and API updates. If you're evaluating European-hosted alternatives for data residency or regulatory reasons, worth checking what shipped. And there's a new reproducible world model platform from Galilai Group — if you're doing anything with learned simulators, robotics, or video prediction, it gives you standardized baselines instead of reimplementing papers.

Nadia

Oh, I also want to flag two essays from this section. One on which human skills still matter as models get better, and one from Vicki Boykis arguing we should focus on the unglamorous parts AI can't handle — data quality, evaluation, understanding failure modes. Both are great gut-checks if you're shipping AI features right now.

Marcus

Alright, dev tools. The one that caught my eye is DBOS making the case that Postgres is all you need for durable workflows. No Temporal, no Inngest, just build durable execution directly on Postgres.

Nadia

Three hundred plus Hacker News points on that one, so clearly it's resonating. If you're already Postgres-native and you're tired of bolting on a separate orchestration layer just for workflow state, this is genuinely worth evaluating. I love tools that eliminate infrastructure.

Marcus

Also from Ink and Switch, the local-first folks dropped Bijou64 — a new variable-length integer encoding. Niche but super relevant if you're building CRDTs or sync protocols where compact wire formats matter.

Nadia

And a quick heads-up — Garnix, the Nix CI service, is shutting down. If you're using it, start migrating now. Hercules CI or self-hosted are your main options. The Nix CI ecosystem remains... fragile, let's say.

Marcus

Okay, security. This one's spicy. GitHub banned a security researcher who posted Windows zero-day exploits. The researcher claims it's vindictive retaliation from Microsoft — who, of course, owns GitHub.

Nadia

That's interesting because it highlights a real platform risk that a lot of people don't think about. The company that owns the platform where you host your security tooling is also the company whose software you might be finding vulnerabilities in. That's a structural conflict of interest.

Marcus

If you host proof-of-concept code or security tooling on GitHub, the takeaway is simple — mirror your critical repos elsewhere. Don't let a single platform decision wipe out your work.

Nadia

Also in health-tech news, a company called Headway is requiring biometric face scans for therapy patients to keep getting care. If you're building in health-tech, that's a cautionary tale about friction and privacy liability with biometric requirements.

Marcus

Let's hit startups and launches quickly. Raspberry Pi 6 details are coming — Jeff Geerling has the rundown, link in the briefing. If you're doing edge AI or embedded work, start checking toolchain compatibility now.

Nadia

And there's a great story about a guy named Nick Winans who built a million-dollar hardware product — a wireless keyboard microcontroller called nice!nano — solo from his dorm room. Open-source community, real pain point, direct sales. It's a clean playbook for niche hardware.

Marcus

Quick hits — Blue Origin's New Glenn rocket exploded during a static fire test. GTA 6 developers at Rockstar unionized. And there's a sobering piece on just how much data modern cars are collecting about you.

Nadia

Also someone wrote a very detailed nitpick of the shell history scene in Tron: Legacy, which — honestly — is the kind of content the internet was made for.

Marcus

Ha! Alright, here's the takeaway for today. The self-hosted inference story is the one to act on. Three thousand tokens per second on commodity GPUs, a mystery model topping the charts, Mistral pushing new capabilities — the cost of being locked into a single LLM provider is going up while the cost of running your own is coming down.

Nadia

So if you're building AI features with any meaningful token volume, invest in an abstraction layer now. Whether that's LiteLLM, a simple router, or your own gateway — the builders who can swap models and providers without rewriting their app are going to have a real structural advantage in six months.

Marcus

That's the briefing for May 30th. Links to everything we talked about are in the show notes. Thanks for listening, everyone.

Nadia

Go build something. We'll see you next time.

The Big Story

Kog.ai published benchmarks showing 3,000 tokens per second per request on commodity GPUs — not H100 clusters, not custom silicon, standard hardware you can actually rent. If these numbers hold under real workloads, this is a step-change for anyone running self-hosted inference. The cost calculus for building on top of API providers versus running your own stack just shifted meaningfully.

For builders, this matters right now if you're paying per-token for high-throughput workloads like code generation, document processing, or agent loops. At 3k tokens/s you can serve interactive applications with streaming responses that feel instantaneous, on hardware that costs a fraction of what frontier API calls do at scale. The blog details the optimization stack — speculative decoding, quantization, and kernel-level tweaks — so you can evaluate whether their approach fits your model size and latency requirements.

This also signals where self-hosted inference is heading over the next six months: the performance gap between API providers and DIY deployments is closing fast. Combined with the mysterious Hy3 model dominating OpenRouter rankings and Mistral's summit announcements, the inference layer is getting competitive in ways that favor builders who control their own stack. If you're locking into a single provider right now, build your abstraction layer — you'll want the flexibility to switch.

@newsycombinator Read source View tweet 315 engagement

AI & Models

Stable World Model: Reproducible world model research gets a shared platform

Galilai Group dropped a GitHub repo for standardized world model research and evaluation. If you're building anything involving learned simulators — robotics, game AI, video prediction — this gives you reproducible baselines instead of reimplementing papers from scratch.

@github Read source View tweet 1,730 engagement

Mysterious Hy3 LLM tops OpenRouter rankings by a wide margin

An unknown model called Hy3 is dominating OpenRouter's model rankings with no clear provenance. Builders routing through OpenRouter should test it, but treat it as a black box — no one knows who's behind it or what the long-term availability looks like.

@newsycombinator Read source View tweet 124 engagement

Mistral AI Now Summit: Key announcements from Paris

Notes from Mistral's summit cover new model releases and API updates. If you're evaluating European-hosted alternatives to OpenAI/Anthropic for data residency or regulatory reasons, check what shipped.

@newsycombinator Read source View tweet 143 engagement

Expertise in the age of AI: What still matters when models get good

A thoughtful essay on which human skills retain value as AI capability increases. Worth reading if you're deciding what to automate versus what to keep human in your product's workflow.

@newsycombinator Read source View tweet 169 engagement

"We should be more tired than the model" — on doing the hard work AI can't

Vicki Boykis argues builders should focus effort on the unglamorous parts AI doesn't handle: data quality, evaluation, understanding failure modes. If you're shipping AI features, this is a good gut-check on where you're spending your time.

@newsycombinator Read source View tweet 268 engagement

Developer Tools

Durable workflows on Postgres: DBOS says Postgres is all you need

DBOS makes the case that you can build Temporal/Inngest-style durable execution directly on Postgres without a separate orchestration layer. If you're already Postgres-native and tired of adding infrastructure for workflow state, this is worth evaluating — 306 HN points suggests real interest.

@newsycombinator Read source View tweet 570 engagement

Bijou64: Ink & Switch's new variable-length integer encoding

A compact encoding scheme from the local-first pioneers at Ink & Switch. If you're building CRDTs, sync protocols, or anything where compact wire formats matter, this is a clean alternative to varint.

@newsycombinator Read source View tweet 249 engagement

Coalton: A statically typed Lisp with Haskell and OCaml ideas

Coalton brings ML-family type systems to Common Lisp. Niche but interesting if you want algebraic data types and type inference without leaving the Lisp ecosystem.

@newsycombinator Read source View tweet 192 engagement

Garnix (Nix CI) is shutting down

If you're using Garnix for Nix-based CI, start migrating now. The Nix CI ecosystem remains fragile — consider Hercules CI or self-hosted solutions.

@newsycombinator Read source View tweet 101 engagement

Security

GitHub bans security researcher who posted Windows zero-day exploits

Microsoft's GitHub banned a researcher who published zero-day Windows exploits, with the researcher claiming it's vindictive retaliation. This is a real platform risk: if you host security tooling or PoC code on GitHub, your account can be nuked by the platform owner who also makes the vulnerable software. Consider mirroring critical repos.

@newsycombinator Read source View tweet 610 engagement

Bettercap: Swiss Army knife for network recon and MITM attacks trending on GitHub

Bettercap is seeing a spike in GitHub stars. If you're building IoT products or anything on BLE/CAN-bus/802.11, use this to audit your own attack surface before someone else does.

@github Read source View tweet 255 engagement

Therapy patients forced to scan faces to keep getting care

Headway is requiring biometric face scans for identity verification in healthcare. If you're building in health-tech, this is a cautionary tale — biometric requirements create huge friction and privacy liability.

@newsycombinator Read source View tweet 121 engagement

New Launches & Releases

Raspberry Pi 6 and new microcontroller dev news

Jeff Geerling covers incoming RPi 6 details and microcontroller updates. If you're building edge AI or embedded products, start checking compatibility with your current toolchain now rather than scrambling at launch.

@newsycombinator Read source View tweet 466 engagement

CasaOS trending: Open-source personal cloud system

CasaOS is getting GitHub attention as a self-hosted cloud OS. If you're building local-first or self-hosted apps, it's a clean deployment target for non-technical users.

@github Read source View tweet 125 engagement

Startups & Funding

Million-dollar product from a dorm room: The nice!nano story

Nick Winans built a $1M hardware product (a wireless keyboard microcontroller) solo from his dorm. Good playbook for niche hardware builders: open-source community + solving a real pain point + direct sales.

@newsycombinator Read source View tweet 421 engagement

The Dead Economy Theory sparks debate on HN

An essay arguing economic indicators are masking structural decay drew 212 HN comments. Whether you buy the thesis or not, if you're fundraising or planning product strategy, the macro sentiment among technical builders is worth understanding.

@newsycombinator Read source View tweet 602 engagement

Quick Hits

Blue Origin's New Glenn rocket explodes during static fire test

@newsycombinator

GTA 6 developers at Rockstar Games unionize

@newsycombinator

Cars collect a startling amount of data about you — and it's getting worse

@newsycombinator

Framework 12 laptop: hard to justify the price

@newsycombinator

Nitpicking the shell history scene in Tron: Legacy

@newsycombinator

"I am retiring from tech to live offline"

@newsycombinator

I hated writing until I learned there's a science to it

@newsycombinator

The Takeaway

The self-hosted inference story is the one to act on. With 3k tokens/s on commodity GPUs, a mystery model topping OpenRouter, and Mistral pushing new capabilities, the cost of being locked into a single LLM provider is rising while the cost of running your own is dropping. If you're building AI features with significant token volume, invest in an abstraction layer now — whether that's LiteLLM, a simple router, or your own gateway. The builders who can swap models and providers without rewriting their app will have a structural advantage in six months.

3,000 tokens/s on standard GPUs: Self-hosted LLM inference just got real

Stable World Model: Reproducible world model research gets a shared platform

Mysterious Hy3 LLM tops OpenRouter rankings by a wide margin

Mistral AI Now Summit: Key announcements from Paris

Expertise in the age of AI: What still matters when models get good

"We should be more tired than the model" — on doing the hard work AI can't

Durable workflows on Postgres: DBOS says Postgres is all you need

Bijou64: Ink & Switch's new variable-length integer encoding

Coalton: A statically typed Lisp with Haskell and OCaml ideas

Garnix (Nix CI) is shutting down

GitHub bans security researcher who posted Windows zero-day exploits

Bettercap: Swiss Army knife for network recon and MITM attacks trending on GitHub

Therapy patients forced to scan faces to keep getting care

Raspberry Pi 6 and new microcontroller dev news

CasaOS trending: Open-source personal cloud system

Million-dollar product from a dorm room: The nice!nano story

The Dead Economy Theory sparks debate on HN

Get this briefing in your inbox