WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-04-27

1
00:00:00.000 --> 00:00:03.959
<v Alex>Hey everyone, welcome to Builder's Briefing for April twenty-seventh, twenty twenty-six. I'm Alex, joined as always by Sam, and today we've got a packed show.

2
00:00:03.959 --> 00:00:08.971
<v Sam>Yeah, big themes today — AI coding benchmarks hitting their ceiling, post-quantum crypto going mainline, and a non-mathematician cracking a sixty-year-old math problem with ChatGPT. Let's get into it.

3
00:00:08.971 --> 00:00:19.521
<v Alex>Alright, so the big story: OpenAI officially retired SWE-bench Verified. For folks who haven't been tracking this, SWE-bench became the gold standard for measuring how good AI coding agents are — can your model look at a real GitHub issue and produce a working fix? And OpenAI is basically saying, we're done with it. Frontier models have saturated the benchmark to the point where the scores don't mean anything anymore.

4
00:00:19.521 --> 00:00:25.860
<v Sam>Right, and what's wild is how many teams were using those scores to make real procurement decisions. Like, 'oh this model scores four points higher on SWE-bench, let's go with that one.' If that's been your decision framework, you now need a new signal.

5
00:00:25.860 --> 00:00:33.854
<v Alex>Exactly. And OpenAI is hinting they'll propose replacement benchmarks — probably something more agentic, multi-step, multi-file. But their advice in the meantime is pretty practical: the best benchmark is your own codebase. Run evals against your actual repos, your actual bug patterns, your actual PR review standards.

6
00:00:33.854 --> 00:00:41.823
<v Sam>That's interesting because it tracks with where the whole space is headed. We're moving from 'can the AI fix a single isolated issue' to 'can it handle sustained engineering work across a whole project.' Coding copilots becoming coding coworkers, basically. And the benchmarks just haven't caught up to that shift yet.

7
00:00:41.823 --> 00:00:49.516
<v Alex>And you can see that shift in today's other AI stories too. There's this Awesome Codex Skills list from ComposioHQ that hit over twenty-five hundred engagements on GitHub — it's essentially a cookbook for wiring Codex into real workflows. CI pipelines, refactoring, migration scripts. Not just chat anymore.

8
00:00:49.516 --> 00:00:53.400
<v Sam>I love that resource. If you're using Codex beyond just asking it questions, start there instead of reinventing prompts from scratch. Link in the briefing.

9
00:00:53.400 --> 00:01:00.566
<v Alex>And then there's Beads, which gives coding agents persistent memory across sessions. So your agent remembers your project conventions, your past architecture decisions, your codebase patterns. If your agent keeps forgetting your choices between conversations, this solves that directly.

10
00:01:00.566 --> 00:01:05.904
<v Sam>Oh man, that's a pain point I feel personally. You start a new session and spend the first ten minutes re-explaining your entire project structure. Persistent memory for agents is such a fundamental missing piece.

11
00:01:05.904 --> 00:01:10.966
<v Alex>Now here's my favorite story of the day. An amateur — not a mathematician — used ChatGPT to solve a sixty-year-old open problem from Erdős in combinatorics. And the proof was verified by actual experts.

12
00:01:10.966 --> 00:01:17.832
<v Sam>Okay, that's remarkable. But I think the real takeaway for builders isn't 'AI replaces mathematicians.' It's that LLMs as reasoning partners for domain exploration is a genuinely underexplored product surface. Like, there's a whole category of tools to be built around that.

13
00:01:17.832 --> 00:01:25.825
<v Alex>Totally agree. Also worth flagging — OpenAI launched a new privacy filter for API and product usage. Enterprises can now control what data OpenAI sees and retains. If your compliance team has been blocking you from deploying OpenAI models, especially in healthcare or finance, check whether this unblocks your use case.

14
00:01:25.825 --> 00:01:28.707
<v Sam>That's a big deal for a lot of teams that have been stuck. Compliance has been the real bottleneck, not capability.

15
00:01:28.707 --> 00:01:34.621
<v Alex>Switching to dev tools — GitHub made a UI change that's causing some real frustration. Issue links now open in a modal popup instead of navigating to the actual issue page. Hundred and twenty-six Hacker News points worth of frustration.

16
00:01:34.621 --> 00:01:40.109
<v Sam>I saw that and immediately felt the pain. If you maintain open-source projects, you might want to start linking to full issue URLs in your docs and READMEs as a workaround, because contributors are going to be confused.

17
00:01:40.109 --> 00:01:45.396
<v Alex>Also trending again is statecharts dot dev — a deep resource on hierarchical state machines. And this is having a moment because state machines turn out to be the sane way to manage multi-step AI agent behavior.

18
00:01:45.396 --> 00:01:50.007
<v Sam>Yeah, if you're doing any agent orchestration, bookmark that. State machines give you the kind of predictability you desperately need when you've got an LLM making decisions in a loop.

19
00:01:50.007 --> 00:01:56.121
<v Alex>Alright, security — two big ones. First, GnuPG has landed post-quantum cryptography support in mainline. Not a fork, mainline. If you sign releases, manage package repos, or handle encrypted communications, start testing PQC key generation now.

20
00:01:56.121 --> 00:01:59.003
<v Sam>Migration timelines are getting real on this. It's not theoretical anymore — you should be playing with this today.

21
00:01:59.003 --> 00:02:06.420
<v Alex>And then there's an analysis of EU age verification proposals that would effectively mandate digital identity for all web usage. Builders serving EU users should start thinking about age-gating and identity verification architecture now, because some form of this regulation is coming regardless.

22
00:02:06.420 --> 00:02:11.206
<v Sam>The compliance surface just keeps expanding. Between PQC and the EU digital ID push, if you're shipping anything in those markets, you want to bake this in now rather than retrofitting later.

23
00:02:11.206 --> 00:02:18.574
<v Alex>Couple of cool releases to highlight. Asahi Linux hit version seven — Apple Silicon Linux is getting serious. GPU acceleration, audio, suspend and resume are all substantially more mature now. If you've been waiting to run Linux on M-series Macs for dev or CI, this might be your tipping point.

24
00:02:18.574 --> 00:02:26.617
<v Sam>I've been watching Asahi for a while and this really does feel like a milestone. Also, there's a PlayCanvas demo turning Gaussian splats into playable video games, which is just cool. If you're building anything with splat-based 3D — real estate, training sims, spatial computing — the interaction layer is now buildable.

25
00:02:26.617 --> 00:02:31.855
<v Alex>And Brave's Rust-based ad-block engine is getting renewed GitHub attention. If you're building a browser, a web scraper, or any privacy-focused product, it's battle-tested, fast, and you can embed it directly.

26
00:02:31.855 --> 00:02:40.750
<v Alex>Quick hits before we go: there's a think piece with over two thousand engagements arguing the West forgot how to make things and now it's forgetting how to code. America's geothermal breakthrough could unlock a hundred and fifty gigawatts — energy costs matter for compute. And someone made a tutorial on making circuit boards from clay. Yes, actual clay.

27
00:02:40.750 --> 00:02:43.031
<v Sam>The clay PCB thing is delightful. I have no practical use for it but I love that it exists.

28
00:02:43.031 --> 00:02:53.029
<v Alex>So the through-line today is clear: AI coding tools are outgrowing their benchmarks and their training wheels at the same time. SWE-bench is saturated, Codex has a community cookbook, agents are getting persistent memory. If you're building with coding agents, stop optimizing for benchmark scores and start building eval harnesses against your own codebase. That's the only metric that matters now.

29
00:02:53.029 --> 00:02:57.489
<v Sam>And if you're shipping anything that touches EU users or encrypted data, PQC and the digital ID push both say the compliance surface is expanding fast. Better to build it in now.

30
00:02:57.489 --> 00:03:01.399
<v Alex>That's the briefing for April twenty-seventh. Links to everything we talked about are in the show notes. We'll be back tomorrow — until then, keep building.

31
00:03:01.399 --> 00:03:02.000
<v Sam>See you tomorrow, folks.
