Simon Willison: Vibe Coding and Agentic Engineering Are Converging Fast
Vibe coding meets agentic engineering, OpenReel Video ships a browser-native editor, and Val.town's auth migration lessons for builders.
Hey everyone, welcome to Builder's Briefing for May 8th, 2026. I'm Alex, joined as always by Sam. We've got a packed one today — Simon Willison stirring up a massive debate about vibe coding and agentic engineering converging, some cool new local inference tooling, SQLite getting a wild endorsement, and a few quick hits you won't want to miss.
Yeah, the Simon Willison piece alone has like five hundred plus Hacker News comments, so clearly it hit a nerve. Let's get into it.
So the big story — Simon Willison published a piece arguing that vibe coding and agentic engineering are basically converging. His point is that the tools serious engineering teams use — Cursor, Claude Code, Copilot agents — they run the same autonomous loop that a casual vibe coder uses. The only real difference now is how much you bother to review the output.
Right, and what's wild is that the default is drifting toward just... trusting the agent. Like, I've caught myself doing it. You get a clean diff, the tests pass, and you kind of wave it through. Multiply that across a team and it's a real risk.
Exactly. And his argument is that this is a forcing function for teams. You need explicit policies now — what gets deep review, what's okay to fast-track. Because without that, you're essentially vibe coding in production whether you admit it or not.
That's interesting because it also reframes where the product opportunity is. The generation layer is basically commoditized at this point. The real differentiator is the review and audit layer — better diff UIs, semantic code auditing, test generation as a guardrail.
He predicts we'll see the first serious production incidents from under-reviewed agentic code in the next six months, and honestly, that feels generous. If you're touching compliance or user data with AI-assisted code, invest in your review pipeline now, not after the incident.
So speaking of AI tooling — a couple of really interesting drops this week on the models side.
Yeah, the big one for me is Antirez — the Redis creator — shipping a local inference engine called ds4, optimized for Apple Metal to run DeepSeek 4 Flash. If you're building Mac-native AI tools or just want to dodge API costs for smaller models, this is a clean single-purpose runtime worth benchmarking against llama.cpp.
Love that it's Antirez. The man just ships clean, focused tools. Also worth mentioning — DeepMind published AlphaEvolve results, using Gemini to generate and evolve code solutions across math, science, and engineering. It validates that evolve-and-test loop architecture for agent-driven optimization.
And one quick flag — Chrome quietly removed the claim that its on-device AI features don't send data to Google. So if you're building Chrome extensions or PWAs leaning on those built-in AI APIs, assume your data is hitting Google servers and update your privacy disclosures.
Quietly removed. That's doing a lot of work in that sentence. Classic.
Alright, dev tools. This one's great — Val.town documented their entire auth migration journey. They went from Supabase Auth to Clerk, and finally landed on Better Auth. It's a really honest comparison from a real production app.
I love these kinds of write-ups because you almost never get them. Everyone picks an auth provider and then just lives with the pain silently. The fact that Better Auth's self-hosted model won on flexibility and cost at their scale — that's a strong signal for anyone evaluating auth right now.
And the other big one — SQLite was officially recognized by the Library of Congress as a recommended storage format for archival purposes. If you need a data format that'll be readable in fifty years, SQLite just got the strongest endorsement possible.
That's kind of poetic, right? This little embedded database that started as a Tcl extension is now an archival standard. For builders choosing data formats for long-lived applications, this should settle the debate.
On the new launches front — OpenReel Video dropped, and it's an open-source CapCut alternative that runs entirely in the browser. No uploads, no watermarks, no install. One hundred percent client-side.
That's huge if you're building content creation tools or need to embed video editing in a SaaS product. The browser-only architecture means you can fork it or white-label it without worrying about server costs for media processing. That's a real starting point, not just a demo.
Also, Google is expanding reCAPTCHA into a full fraud defense platform on Google Cloud. If you're currently using reCAPTCHA, expect the integration surface to widen and the pricing to shift. Worth evaluating now whether you want that deeper Google lock-in or look at alternatives like Cloudflare Turnstile.
Alright, hit me with the quick hits.
"Programming Still Sucks" got a 2026 update and it's resonating hard — almost three hundred Hacker News points. RSS feeds are reportedly driving more traffic than Google for some publishers, which is kind of a full-circle moment. And my personal favorite — someone surfaced a TI-83 Plus BASIC programming tutorial from 2004. Pure nostalgia.
Oh man, the TI-83. That was genuinely my first programming environment. Writing little games during math class. Also the RSS thing is fascinating — the open web is quietly making a comeback while everyone's distracted by AI.
So tying it all together — the big takeaway today is that the convergence of vibe coding and agentic engineering isn't just philosophical, it's a tooling gap. If you're building developer tools, the highest-leverage bet right now is the audit and review layer between AI-generated code and production.
And the Val.town auth story reinforces a pattern we keep seeing — own your critical infrastructure early. Third-party auth, third-party AI services, they're all moving targets that can change terms under you.
That's the briefing for May 8th. All the links are in the show notes. If you're building review tooling for agentic code, honestly, reach out — we'd love to hear what you're working on.
And go dig up that TI-83 tutorial. You deserve a little nostalgia break. See you all tomorrow.
Simon Willison's latest piece is generating massive discussion (546 HN points, 581 comments) because it names something many of us feel: the gap between 'vibe coding' — letting an LLM generate code you don't fully review — and 'agentic engineering' — structured, tool-augmented AI workflows with human oversight — is collapsing. The tooling that serious engineering teams use (Cursor, Claude Code, Copilot agents) increasingly defaults to the same autonomous loop that casual vibe coders rely on. The difference is shrinking to just how much you review the output, not the underlying mechanism.
For builders, this is a forcing function. If you're leading a team, you need explicit policies on what agentic code gets reviewed at what depth, because the default mode is drifting toward trust-the-agent. If you're building developer tools, the implication is clear: the review/audit layer is now the product differentiator, not the generation layer. Expect demand for better diff-review UIs, semantic code auditing, and test-generation-as-guardrail tooling to spike.
What this signals for the next six months: we'll see the first serious production incidents attributed to under-reviewed agentic code, and the tooling ecosystem will bifurcate into 'fast and loose' (solo builders, prototypes) and 'auditable agentic' (teams, regulated industries). If you're building anything that touches compliance or handles user data with AI-assisted code, invest in your review pipeline now — before the incident that forces you to.
AlphaEvolve: DeepMind's Gemini-Powered Coding Agent Shows Cross-Domain Impact
DeepMind published results on AlphaEvolve using Gemini to generate and evolve code solutions across math, science, and engineering. If you're building agent-driven optimization pipelines, this validates the evolve-and-test loop architecture — and hints that Gemini's code capabilities are being positioned for more than chat.
DeepSeek 4 Flash: Antirez Ships Local Inference Engine for Apple Metal
Antirez (yes, the Redis creator) released ds4, a local inference engine optimized for Apple Metal to run DeepSeek 4 Flash. If you're building Mac-native AI tools or want to avoid API costs for smaller models, this is a clean, single-purpose runtime worth benchmarking against llama.cpp.
Proxima: Multi-AI MCP Server Connects LLMs to Your Dev Tools Without API Keys
Proxima lets you route ChatGPT, Claude, Gemini, and Perplexity into your coding environment via MCP without needing individual API keys. Useful if you're building internal tooling that needs to be model-agnostic, though verify the auth model before putting this anywhere near production.
Hallucinopedia: A Catalog of LLM Hallucination Patterns
Show HN project cataloging known hallucination types with examples. If you're building eval suites or user-facing AI features, this is a practical reference for the failure modes you should be testing against — bookmark it for your QA team.
Chrome Quietly Removes 'On-Device AI Doesn't Send Data to Google' Claim
Chrome's on-device AI features no longer carry the claim that data stays local. If you're building Chrome extensions or PWAs that lean on built-in AI APIs, assume data hits Google servers and update your privacy disclosures accordingly.
Val.town Migrates Auth: Supabase → Clerk → Better Auth
Val.town documented their full auth migration journey, landing on Better Auth after trying Supabase Auth and Clerk. If you're evaluating auth for a new project, this is an honest comparison from a real production app — Better Auth's self-hosted model won on flexibility and cost at their scale.
RaTeX: KaTeX-Compatible LaTeX Rendering in Pure Rust
Drop-in KaTeX replacement written in Rust, targeting WASM and native. If you're rendering math in docs, notebooks, or educational tools, this could cut bundle size and improve render speed — especially in WASM-heavy environments where KaTeX's JS overhead adds up.
SQLite Recognized as Library of Congress Recommended Storage Format
Official institutional validation of SQLite as an archival format. This matters for builders choosing data formats for long-lived applications: if you need a storage format that will be readable in 50 years, SQLite just got the strongest endorsement possible.
The Self-Cancelling Subscription: A Clever Rust Pattern
A Rust pattern for subscriptions that automatically clean up when dropped. Useful if you're building event-driven systems in Rust and have been rolling your own cleanup logic — this is a clean, composable approach worth stealing.
OpenReel Video: Open-Source CapCut Alternative Running Entirely in the Browser
Full video editor that runs 100% client-side — no uploads, no watermarks, no install. If you're building content creation tools or need to embed video editing in a SaaS product, this is a real starting point. The browser-only architecture means you can white-label or fork without worrying about server costs for media processing.
Google Cloud Fraud Defense: reCAPTCHA Evolves Into a Full Fraud Platform
Google is expanding reCAPTCHA from bot detection into a broader fraud defense product on Google Cloud. If you're currently using reCAPTCHA, expect the integration surface to widen — and the pricing model to shift. Evaluate now whether you want deeper Google lock-in on fraud or should look at alternatives like Turnstile.
Cat-Catch: Browser Extension for Sniffing and Downloading Media Resources
Trending on GitHub — a browser extension that detects and lets you download media resources from web pages. Useful as a debugging tool if you're building media-heavy sites and want to verify what's actually being served to clients.
Diskless Linux Boot with ZFS, iSCSI, and PXE — A Complete Walkthrough
Detailed guide on network-booting diskless Linux machines using ZFS over iSCSI. If you're managing a homelab, CI fleet, or edge deployment, this is a well-documented path to stateless nodes with snapshot-based rollbacks.
The convergence of vibe coding and agentic engineering isn't just a philosophical debate — it's a tooling gap. If you're building developer tools, the highest-leverage bet right now is the audit and review layer between AI-generated code and production. If you're building products with AI-assisted code, invest in testing infrastructure (evals, snapshot tests, semantic diffing) disproportionate to what you'd normally allocate. The auth migration story from Val.town reinforces a recurring pattern: own your critical infrastructure early, because third-party auth and AI services are moving targets that can change terms under you.