WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-03-13

1
00:00:00.000 --> 00:00:07.546
<v Marcus>Good morning and welcome to Builder's Briefing for March thirteenth, twenty twenty-six. I'm Alex, joined as always by Sam, and we've got a packed show today — agent memory is becoming its own product category, Google drops a production-ready agentic framework, and Hacker News officially bans AI comments.

2
00:00:07.546 --> 00:00:10.762
<v Nadia>Yeah, it's one of those days where you can feel the ground shifting under a couple of different things at once. Let's get into it.

3
00:00:10.762 --> 00:00:22.440
<v Marcus>So the big story — three open-source projects all trending at the same time, and they're all attacking the same problem from different angles: agent memory. We've got OpenRAG, which bundles Langflow, Docling, and OpenSearch into a single deployable RAG stack. Hindsight from Vectorize, which gives agents memory that actually learns from interactions over time. And then Memvid, which is the wild one — it replaces your entire RAG pipeline with a single-file memory layer.

4
00:00:22.440 --> 00:00:31.743
<v Nadia>Okay, that's interesting because each of these represents a totally different philosophy. OpenRAG is basically saying, look, RAG works, we just need to stop gluing five services together. Hindsight is saying static retrieval isn't enough, your agent needs to get smarter the more it's used. And Memvid is saying — throw out the vector database entirely for a lot of use cases.

5
00:00:31.743 --> 00:00:37.384
<v Marcus>Right, and what's wild is the timing. The fact that all three are gaining traction simultaneously tells you the market is screaming for better memory architectures. Builders are done with brittle RAG pipelines that kind of work.

6
00:00:37.384 --> 00:00:46.513
<v Nadia>As a developer, the Hindsight one really catches my eye. Think about a customer support agent that remembers which resolution patterns actually worked — that's not just retrieval, that's learning. That's a fundamentally different product experience. But Memvid is the one I'd prototype this weekend just to understand the trade-offs of going serverless and single-file.

7
00:00:46.513 --> 00:00:52.971
<v Marcus>The signal here is clear: agent memory is becoming a first-class design decision, not something you bolt on after the fact. The teams that get this right are going to ship agents that feel completely different from the current crop of search-then-generate bots.

8
00:00:52.971 --> 00:00:57.275
<v Nadia>Agreed. Pick one, build something small, and see which memory model actually fits your agent's real usage patterns. Don't just go with the one that has the most GitHub stars.

9
00:00:57.275 --> 00:01:04.846
<v Marcus>Alright, moving to AI and models — Google just open-sourced A2UI, and they're explicitly calling it production-ready. This is a framework for building agentic workflows, and it's backed by Google's infrastructure team. If you've been evaluating LangGraph or CrewAI, this just entered the chat in a big way.

10
00:01:04.846 --> 00:01:11.056
<v Nadia>The fact that they're labeling it production-ready and not just another research prototype is a deliberate signal. Google is saying we want you to build real things on this, not just experiment. That changes the evaluation calculus for a lot of teams.

11
00:01:11.056 --> 00:01:18.429
<v Marcus>And here's one that made me do a double-take — the creator of Kotlin, Andrey Breslav, launched a project called Codespeak. It's essentially a formal language for talking to LLMs. Instead of English prompts, you write structured, deterministic instructions. Think of it as a type system for prompts.

12
00:01:18.429 --> 00:01:25.431
<v Nadia>Oh, I love this conceptually. Anyone who's dealt with prompt fragility in production knows the pain. You change one word and suddenly your output format breaks. If this can bring even a fraction of the reliability that type systems brought to programming languages, it's a huge deal.

13
00:01:25.431 --> 00:01:32.333
<v Marcus>Now, this next one is important for anyone benchmarking coding agents. METR did an analysis showing that AI-generated pull requests that pass SWE-bench often wouldn't actually get merged in a real code review. Wrong abstractions, poor test coverage, style violations — the works.

14
00:01:32.333 --> 00:01:39.484
<v Nadia>That's a gut check for the whole industry. We've been treating SWE-bench scores like they're the SAT for coding agents, and it turns out passing the test doesn't mean you can do the job. You need to test against your actual merge criteria — your team's standards, your codebase's patterns.

15
00:01:39.484 --> 00:01:44.902
<v Marcus>Also worth a quick mention — Claude now has native interactive charts and visualizations. Go from data to a chart in a single prompt, no charting library needed. Great for internal dashboards and quick data exploration.

16
00:01:44.902 --> 00:01:51.582
<v Marcus>On the dev tools front, a couple of things caught my eye. First, 9router — it's a unified proxy that connects all your AI code tools to over a hundred models. So if your team is juggling Claude Code, Cursor, Copilot, and Gemini, this routes everything through one place.

17
00:01:51.582 --> 00:01:57.322
<v Nadia>That's a real pain point for engineering leads. Right now if you want to switch models or providers, you're reconfiguring every developer's setup individually. Having one proxy handle all of that is just good infrastructure hygiene.

18
00:01:57.322 --> 00:02:05.140
<v Marcus>And then there's a Show HN project called 'nah' — and yes, that's the actual name — which adds granular permission controls to Claude Code. You can whitelist or blacklist file access and operations based on context. If you're letting Claude Code touch production repos, this is the guardrail you should already have.

19
00:02:05.140 --> 00:02:10.929
<v Nadia>The name alone deserves a star on GitHub. But seriously, giving an AI agent unrestricted access to your codebase has always felt like handing someone your house keys on the first date. Context-aware permissions should be table stakes.

20
00:02:10.929 --> 00:02:17.189
<v Marcus>Shifting gears to security — this one's serious. Iran-backed hackers hit Stryker, the medical device giant, with a wiper attack. And I want to emphasize — this is a wiper, not ransomware. There's no negotiation, no decryption key. It's pure destruction.

21
00:02:17.189 --> 00:02:24.240
<v Nadia>That distinction really matters. Ransomware at least has a business logic to it — pay and maybe get your data back. Wipers are just scorched earth. If you're in healthcare tech or any critical infrastructure, revisit your incident response plan this week. Not next quarter — this week.

22
00:02:24.240 --> 00:02:32.380
<v Marcus>Also on the infrastructure side, a project called Malus hit over five hundred points on Hacker News — it's clean room as a service. Ephemeral, isolated compute environments for running untrusted code. If you're building agents that execute arbitrary code, this solves the sandbox problem without you having to manage it yourself.

23
00:02:32.380 --> 00:02:37.897
<v Nadia>That ties right back to our hero story. As agents get more capable and start running code, you absolutely need disposable sandboxes. This is the kind of boring infrastructure that makes exciting agent features safe to ship.

24
00:02:37.897 --> 00:02:42.920
<v Marcus>Quick hits — Hacker News officially banned AI-generated comments. Updated the guidelines, drew a hard line. If you're building any community product, you need a policy on AI participation now, not later.

25
00:02:42.920 --> 00:02:47.695
<v Nadia>That's a huge signal. One of the most influential tech communities on the internet just said AI-as-participant is off limits. Every community platform is going to have to take a stance on this.

26
00:02:47.695 --> 00:02:57.096
<v Marcus>Also, Gruber wrote a deep dive on what looks like Apple's next-gen MacBook line — it got over eight hundred comments on HN. For builders, if a new chip architecture or form factor is coming, start thinking about how your CI pipelines handle different ARM performance tiers. And the Met just released high-def 3D scans of a hundred and forty famous art objects, which is just cool.

27
00:02:57.096 --> 00:02:59.644
<v Nadia>The Met one is the kind of thing that makes the internet great. Links in the briefing for all of these.

28
00:02:59.644 --> 00:03:07.017
<v Marcus>So to wrap it up — the takeaway today is about two things. First, agent memory is unbundling from RAG. If you're building any agent system, you need to consciously choose between static retrieval, learning memory, and lightweight single-file approaches. They solve fundamentally different problems.

29
00:03:07.017 --> 00:03:12.535
<v Nadia>And second, the METR findings should change how you evaluate coding agents starting today. Stop trusting benchmark scores in isolation. Test against your actual merge criteria — your standards, your patterns, your codebase.

30
00:03:12.535 --> 00:03:17.582
<v Marcus>The builders who ship reliable agents this quarter will be the ones who picked the right memory architecture and the right eval framework — not the ones chasing the highest-scoring model on a leaderboard.

31
00:03:17.582 --> 00:03:20.204
<v Nadia>Well said. Prototype something this weekend, folks. Pick one of those memory projects and see what clicks.

32
00:03:20.204 --> 00:03:23.000
<v Marcus>That's the show for March thirteenth. All links are in the briefing. We'll see you tomorrow — go build something.