Agent Memory Gets Real: Three New OSS Projects Attack RAG From Different Angles

The Rundown No. 28 · Audio Edition · 3 min All episodes RSS MP3

0:00 / 3:23

VTT

Marcus

Good morning and welcome to Builder's Briefing for March thirteenth, twenty twenty-six. I'm Alex, joined as always by Sam, and we've got a packed show today — agent memory is becoming its own product category, Google drops a production-ready agentic framework, and Hacker News officially bans AI comments.

Nadia

Yeah, it's one of those days where you can feel the ground shifting under a couple of different things at once. Let's get into it.

Marcus

So the big story — three open-source projects all trending at the same time, and they're all attacking the same problem from different angles: agent memory. We've got OpenRAG, which bundles Langflow, Docling, and OpenSearch into a single deployable RAG stack. Hindsight from Vectorize, which gives agents memory that actually learns from interactions over time. And then Memvid, which is the wild one — it replaces your entire RAG pipeline with a single-file memory layer.

Nadia

Okay, that's interesting because each of these represents a totally different philosophy. OpenRAG is basically saying, look, RAG works, we just need to stop gluing five services together. Hindsight is saying static retrieval isn't enough, your agent needs to get smarter the more it's used. And Memvid is saying — throw out the vector database entirely for a lot of use cases.

Marcus

Right, and what's wild is the timing. The fact that all three are gaining traction simultaneously tells you the market is screaming for better memory architectures. Builders are done with brittle RAG pipelines that kind of work.

Nadia

As a developer, the Hindsight one really catches my eye. Think about a customer support agent that remembers which resolution patterns actually worked — that's not just retrieval, that's learning. That's a fundamentally different product experience. But Memvid is the one I'd prototype this weekend just to understand the trade-offs of going serverless and single-file.

Marcus

The signal here is clear: agent memory is becoming a first-class design decision, not something you bolt on after the fact. The teams that get this right are going to ship agents that feel completely different from the current crop of search-then-generate bots.

Nadia

Agreed. Pick one, build something small, and see which memory model actually fits your agent's real usage patterns. Don't just go with the one that has the most GitHub stars.

Marcus

Alright, moving to AI and models — Google just open-sourced A2UI, and they're explicitly calling it production-ready. This is a framework for building agentic workflows, and it's backed by Google's infrastructure team. If you've been evaluating LangGraph or CrewAI, this just entered the chat in a big way.

Nadia

The fact that they're labeling it production-ready and not just another research prototype is a deliberate signal. Google is saying we want you to build real things on this, not just experiment. That changes the evaluation calculus for a lot of teams.

Marcus

And here's one that made me do a double-take — the creator of Kotlin, Andrey Breslav, launched a project called Codespeak. It's essentially a formal language for talking to LLMs. Instead of English prompts, you write structured, deterministic instructions. Think of it as a type system for prompts.

Nadia

Oh, I love this conceptually. Anyone who's dealt with prompt fragility in production knows the pain. You change one word and suddenly your output format breaks. If this can bring even a fraction of the reliability that type systems brought to programming languages, it's a huge deal.

Marcus

Now, this next one is important for anyone benchmarking coding agents. METR did an analysis showing that AI-generated pull requests that pass SWE-bench often wouldn't actually get merged in a real code review. Wrong abstractions, poor test coverage, style violations — the works.

Nadia

That's a gut check for the whole industry. We've been treating SWE-bench scores like they're the SAT for coding agents, and it turns out passing the test doesn't mean you can do the job. You need to test against your actual merge criteria — your team's standards, your codebase's patterns.

Marcus

Also worth a quick mention — Claude now has native interactive charts and visualizations. Go from data to a chart in a single prompt, no charting library needed. Great for internal dashboards and quick data exploration.

Marcus

On the dev tools front, a couple of things caught my eye. First, 9router — it's a unified proxy that connects all your AI code tools to over a hundred models. So if your team is juggling Claude Code, Cursor, Copilot, and Gemini, this routes everything through one place.

Nadia

That's a real pain point for engineering leads. Right now if you want to switch models or providers, you're reconfiguring every developer's setup individually. Having one proxy handle all of that is just good infrastructure hygiene.

Marcus

And then there's a Show HN project called 'nah' — and yes, that's the actual name — which adds granular permission controls to Claude Code. You can whitelist or blacklist file access and operations based on context. If you're letting Claude Code touch production repos, this is the guardrail you should already have.

Nadia

The name alone deserves a star on GitHub. But seriously, giving an AI agent unrestricted access to your codebase has always felt like handing someone your house keys on the first date. Context-aware permissions should be table stakes.

Marcus

Shifting gears to security — this one's serious. Iran-backed hackers hit Stryker, the medical device giant, with a wiper attack. And I want to emphasize — this is a wiper, not ransomware. There's no negotiation, no decryption key. It's pure destruction.

Nadia

That distinction really matters. Ransomware at least has a business logic to it — pay and maybe get your data back. Wipers are just scorched earth. If you're in healthcare tech or any critical infrastructure, revisit your incident response plan this week. Not next quarter — this week.

Marcus

Also on the infrastructure side, a project called Malus hit over five hundred points on Hacker News — it's clean room as a service. Ephemeral, isolated compute environments for running untrusted code. If you're building agents that execute arbitrary code, this solves the sandbox problem without you having to manage it yourself.

Nadia

That ties right back to our hero story. As agents get more capable and start running code, you absolutely need disposable sandboxes. This is the kind of boring infrastructure that makes exciting agent features safe to ship.

Marcus

Quick hits — Hacker News officially banned AI-generated comments. Updated the guidelines, drew a hard line. If you're building any community product, you need a policy on AI participation now, not later.

Nadia

That's a huge signal. One of the most influential tech communities on the internet just said AI-as-participant is off limits. Every community platform is going to have to take a stance on this.

Marcus

Also, Gruber wrote a deep dive on what looks like Apple's next-gen MacBook line — it got over eight hundred comments on HN. For builders, if a new chip architecture or form factor is coming, start thinking about how your CI pipelines handle different ARM performance tiers. And the Met just released high-def 3D scans of a hundred and forty famous art objects, which is just cool.

Nadia

The Met one is the kind of thing that makes the internet great. Links in the briefing for all of these.

Marcus

So to wrap it up — the takeaway today is about two things. First, agent memory is unbundling from RAG. If you're building any agent system, you need to consciously choose between static retrieval, learning memory, and lightweight single-file approaches. They solve fundamentally different problems.

Nadia

And second, the METR findings should change how you evaluate coding agents starting today. Stop trusting benchmark scores in isolation. Test against your actual merge criteria — your standards, your patterns, your codebase.

Marcus

The builders who ship reliable agents this quarter will be the ones who picked the right memory architecture and the right eval framework — not the ones chasing the highest-scoring model on a leaderboard.

Nadia

Well said. Prototype something this weekend, folks. Pick one of those memory projects and see what clicks.

Marcus

That's the show for March thirteenth. All links are in the briefing. We'll see you tomorrow — go build something.

The Big Story

Three open-source projects trending simultaneously tell the same story: builders are done with brittle RAG pipelines and want memory layers that actually work. OpenRAG (Langflow + Docling + OpenSearch) packages the full retrieval stack into a single deployable unit. Hindsight from Vectorize offers agent memory that learns from interactions over time rather than just retrieving static chunks. And Memvid takes the most radical approach — replacing complex RAG pipelines entirely with a serverless, single-file memory layer for agents.

If you're building agents today, each of these represents a different bet. OpenRAG is the safe choice if you want a conventional RAG stack without gluing five services together — spin it up, point it at your docs, ship. Hindsight is the one to watch if your agents need to get smarter over time (think: customer support bots that remember resolution patterns). Memvid is the most opinionated — betting that you don't need a vector database at all for many agent memory use cases, just a clever single-file abstraction.

The signal for the next six months: agent memory is becoming a product category, not a feature you bolt on. The teams that treat memory architecture as a first-class design decision — choosing between learning memory, static retrieval, and hybrid approaches — will ship agents that feel fundamentally different from the current crop of 'search then generate' bots. Pick one of these, prototype this weekend, and see which memory model fits your agent's actual usage patterns.

@github Read source View tweet 4,040 engagement

AI & Models

Google Ships A2UI: A Production-Ready Platform for Agentic Workflows

Google open-sourced A2UI, a framework for building agentic workflows that's explicitly labeled 'production-ready' — not a research prototype. If you're evaluating LangGraph, CrewAI, or rolling your own orchestration, this just became a serious contender backed by Google's infra team.

@github Read source View tweet 1,100 engagement

Kotlin Creator Launches Codespeak: A Formal Language for Talking to LLMs

Andrey Breslav's new project replaces English prompts with a structured language designed for deterministic LLM communication. If prompt fragility is costing you reliability in production, this is worth evaluating — it's essentially a type system for prompts.

@newsycombinator Read source View tweet 468 engagement

METR: Most SWE-bench-Passing PRs Wouldn't Actually Get Merged

METR's analysis shows that AI-generated PRs passing SWE-bench often fail real-world code review standards — wrong abstractions, poor test coverage, style violations. If you're benchmarking coding agents, SWE-bench scores alone are misleading; test against your actual merge criteria.

@newsycombinator Read source View tweet 376 engagement

Claude Gets Interactive Charts and Visualizations

Anthropic added native chart/diagram generation to Claude — meaning you can now go from data to interactive visualization in a single prompt without a separate charting library. Useful for internal dashboards and quick data exploration, less so for production-facing UIs.

@newsycombinator Read source View tweet 151 engagement

Developer Tools

9router: One Proxy to Connect All Your AI Code Tools to 100+ Models

If you're juggling Claude Code, Cursor, Copilot, and Gemini across your team, 9router acts as a unified proxy routing any AI code tool to 40+ providers. Practical for teams wanting model flexibility without reconfiguring every developer's setup.

@github Read source View tweet 145 engagement

'nah' — A Context-Aware Permission Guard for Claude Code

This Show HN adds granular permission controls to Claude Code, letting you whitelist/blacklist file access and operations based on context. If you're giving Claude Code access to production repos, this is the kind of guardrail you should have been building anyway.

@newsycombinator Read source View tweet 159 engagement

SiteSpy: Watch Any Webpage and Get Changes as RSS

Simple but useful — monitor competitor pages, API docs, or changelog pages and pipe changes into your existing RSS/automation workflow. Good for tracking upstream dependencies that don't publish proper changelogs.

@newsycombinator Read source View tweet 318 engagement

s@ Protocol: Decentralized Social Networking Over Static Sites

A protocol for social networking that runs entirely on static sites — no servers, no databases. Interesting primitive if you're exploring decentralized identity or building community features without centralized infrastructure costs.

@newsycombinator Read source View tweet 344 engagement

Infrastructure & Cloud

Malus: Clean Room as a Service Hits 542 Points on HN

Malus offers isolated, ephemeral compute environments — think disposable VMs for running untrusted code, CI jobs, or agent sandboxes. If you're building AI agents that execute arbitrary code, this solves the sandbox problem without managing your own isolation layer.

@newsycombinator Read source View tweet 936 engagement

Security

Iran-Backed Hackers Hit Medtech Giant Stryker With Wiper Attack

A wiper attack (not ransomware — pure destruction) on a major medical device manufacturer. If you're in healthcare tech or any critical infrastructure vertical, revisit your incident response plan this week. Wipers don't negotiate.

@newsycombinator Read source View tweet 201 engagement

New Launches & Releases

HN Officially Bans AI-Generated Comments

Hacker News updated its guidelines to explicitly prohibit AI-generated or AI-edited comments. The signal: platforms are drawing hard lines between AI-assisted creation and AI-as-participant. If you're building community products, you need a policy on this now, not later.

@newsycombinator Read source View tweet 5,759 engagement

The MacBook Neo — Gruber's Take on Apple's Next Hardware Play

Daring Fireball's deep dive on what appears to be Apple's next-gen MacBook line got 800+ HN comments. For builders: if a new form factor or chip architecture is coming, start thinking about how your dev toolchain and CI pipelines handle ARM performance tiers.

@newsycombinator Read source View tweet 2,096 engagement

Quick Hits

AI job interviews are here — and they're as awkward as you'd expect

@newsycombinator

Asia adopts 4-day work weeks and WFH amid Iran war fuel crisis

@newsycombinator

The Met releases high-def 3D scans of 140 famous art objects

@newsycombinator

Dolphin emulator ships Release 2603 with major compatibility fixes

@newsycombinator

Italian prosecutors seek trial for Amazon over $1.4B in alleged tax evasion

@newsycombinator

Avoiding Trigonometry — elegant math tricks for graphics programming (2013)

@newsycombinator

US banks' exposure to private credit hits $300B — systemic risk watch

@newsycombinator

The Takeaway

Agent memory is rapidly unbundling from RAG. If you're building any agent system today, evaluate whether you need static retrieval (OpenRAG), learning memory (Hindsight), or lightweight single-file memory (Memvid) — they solve fundamentally different problems. Meanwhile, the METR SWE-bench findings should change how you evaluate coding agents: stop trusting benchmark scores and start testing against your actual merge criteria. The builders who ship reliable agents this quarter will be the ones who picked the right memory architecture and the right eval framework, not the ones chasing the highest-scoring model.

Agent Memory Gets Real: Three New OSS Projects Attack RAG From Different Angles

Google Ships A2UI: A Production-Ready Platform for Agentic Workflows

Kotlin Creator Launches Codespeak: A Formal Language for Talking to LLMs

METR: Most SWE-bench-Passing PRs Wouldn't Actually Get Merged

Claude Gets Interactive Charts and Visualizations

9router: One Proxy to Connect All Your AI Code Tools to 100+ Models

'nah' — A Context-Aware Permission Guard for Claude Code

SiteSpy: Watch Any Webpage and Get Changes as RSS

s@ Protocol: Decentralized Social Networking Over Static Sites

Malus: Clean Room as a Service Hits 542 Points on HN

Iran-Backed Hackers Hit Medtech Giant Stryker With Wiper Attack

HN Officially Bans AI-Generated Comments

The MacBook Neo — Gruber's Take on Apple's Next Hardware Play

Get this briefing in your inbox