WEBVTT
NOTE The Rundown — nextbig.dev daily audio edition, 2026-07-05

1
00:00:04.500 --> 00:00:15.780
<v Oday>It's the Sunday after the Fourth, the wire is thin, and the story that matters is a builder complaining that the best models just got worse at using his tools.

2
00:00:15.780 --> 00:00:26.740
<v Shannon>Independence week is over. Here's the rundown: how the newest models started fumbling other people's tools, why that flips the whole week, and where Jim Keller wants to take chipmaking next.

3
00:00:28.280 --> 00:00:49.880
<v Oday>Armin Ronacher was hacking on his own coding agent, Pi, and watched Anthropic's newest models, Opus 4.8 and Sonnet 5, call his edit tool with fields he never put in the schema. The edits were usually right. The made-up fields got the whole call rejected and retried.

4
00:00:49.880 --> 00:00:59.640
<v Shannon>And his older models didn't do that. Simon Willison passed it along with a blunt line: the newest models in the family are worse at this one tool schema than their older siblings.

5
00:00:59.640 --> 00:01:15.800
<v Oday>Be fair about what this is. The models didn't get worse at coding. They got better. That same weekend, Willison shipped a release of his sqlite-utils library mostly written by Claude Fable, for about a hundred and fifty dollars in tokens.

6
00:01:15.800 --> 00:01:30.360
<v Shannon>So the regression is narrow, and that's what makes it interesting. Ronacher's best guess is that the labs tuned these models hard on their own harnesses, Anthropic on Claude Code, and that quietly made them worse at everyone else's tools.

7
00:01:30.360 --> 00:01:36.520
<v Oday>Capability went up. The ability to plug into a tool someone else built went down.

8
00:01:36.520 --> 00:01:51.400
<v Shannon>And it isn't only Anthropic. The same weekend, a busy issue on OpenAI's Codex repo said its newest model's reasoning had started degrading inside the harness. Two labs, two flagships, the same odd result.

9
00:01:53.660 --> 00:02:08.620
<v Oday>For a week that was all about capable AI getting cheap and portable, this is the counter-current. The model stopped being the scarce, expensive part this month. The scaffolding around it did not.

10
00:02:08.620 --> 00:02:24.940
<v Shannon>So the practical read for anyone building is a budget shift. A month ago you watched the model cost. Now the model is nearly free, and the cost moved into the harness: the retries, the schema mismatches, the fallback model you keep wired in for the calls that can't miss.

11
00:02:24.940 --> 00:02:36.380
<v Oday>Pin the model versions your tools were tested against, because newer is no longer strictly better for tool use. That's the strange part. You might hold an older model on purpose.

12
00:02:36.380 --> 00:02:52.460
<v Shannon>And one floor down, the independence story kept going. Jim Keller's chip startup, Atomic Semi, rebranded to Fab2 and moved to Texas. The pitch is a fab that builds small fabs, and the tools inside them, and then mass-produces the whole factory.

13
00:02:52.460 --> 00:03:09.340
<v Oday>It's seed-stage and years from proof. But it's the deepest version of the week's theme. Independence from one model, then one chip vendor, ends at the fab. Fab2 wants to make the fab a product instead of a chokepoint.

14
00:03:09.520 --> 00:03:28.960
<v Oday>To the tape. We moved Micron to a long. The memory call we made on June 28th keeps confirming: TrendForce sees DRAM and NAND climbing through the third quarter on AI demand, and Micron is selling into the build-out, not the PC cycle.

15
00:03:28.960 --> 00:03:44.000
<v Shannon>We're watching Fab2, low conviction, private, a name not a position. And we're holding Nvidia on watch. Nothing today moves it, but value is migrating up to the harness and down to memory and fabs, and the GPU trade sits in the middle of that squeeze.

16
00:03:44.000 --> 00:03:49.360
<v Oday>The tape is the desk's scorecard, not advice.

17
00:03:53.240 --> 00:03:54.840
<v Oday>Quick break — two from the desk.

18
00:03:54.840 --> 00:04:08.440
<v Shannon>One we know well: vote dot direct. If you're on an H O A or a board, it runs your elections digitally — secure, verifiable, no paper, no clipboard in the lobby. Point your council to vote dot direct.

19
00:04:08.440 --> 00:04:18.840
<v Oday>And if this is your ten minutes of A I for the day, get the written edition too. The full wire, free, every morning — leave your email at nextbig dot dev.

20
00:04:24.790 --> 00:04:39.750
<v Oday>Our call: within six months, at least one major lab publicly admits that tuning its models on its own coding agent made them worse at other people's tools, and ships a named fix for it.

21
00:04:39.750 --> 00:04:51.350
<v Shannon>What proves us wrong: if by January fifth no lab has owned that regression and shipped a fix, and builders are still working around invented tool fields with retries.

22
00:04:51.350 --> 00:05:05.750
<v Oday>The models are cheap and brilliant now. The bill this weekend was reliability, the boring kind, at the machine interface. That's the rundown, and that's the week.
