nextbig.dev
Vancouver, B.C. · Intelligence on AI and the machines that run it
nextbig.dev
The Briefing · Wednesday, June 24, 2026

Claude burned through 2h37m of degraded models in a single day, and it is the third straight week

Claude logged 2h37m of degraded models in one day, the third straight week of capacity-bound outages. What it means for any team running Claude Code in CI.

12 min read
The Rundown No. 124 · Audio Edition · 9 min All episodesRSSMP3
0:00 / 9:06
VTT
The Big Story
Claude burned through 2h37m of degraded models in a single day, and it is the third straight week

If a build pipeline stalled twice on Tuesday, Claude was probably why. Anthropic logged elevated error rates across multiple models starting 14:19 UTC on June 23, and by the close of the day Claude had been degraded for 2 hours and 37 minutes: a 1h 5m hit in the afternoon and a separate 1h 15m Opus 4.8 failure at 6:33 AM. Claude.ai, the Console, the API, Claude Code, and Cowork all took the spike together.

One bad day is noise. This is the third week of it. On June 16, every Sonnet and Opus model ran near a 10% error rate between 17:23 and 18:00 UTC, with Opus 4.8 stuck around 10% until 19:20. On June 13, Anthropic suspended Claude Mythos 5 and Fable 5 outright. No root-cause writeup has been published for any of it.

The failure mode matters more than the minutes lost. These are 529 overloaded errors clustered at peak US hours, which is capacity rationing, not a bug. Anthropic told Fortune that demand has outrun what its infrastructure can serve and that the fix is more capacity through Amazon and Google that is not online yet. For a chat user that is an annoying retry. For the teams who pushed Claude Code into CI/CD, it is a pipeline that fails on a schedule.

Run the exposure. A February SemiAnalysis count put Claude Code at roughly 4% of all public GitHub commits, more than 135,000 a day. A real share of that volume now leans on one provider that throttles at the exact hours US teams ship. Put Claude in your merge gate and your release cadence inherits Anthropic's overload curve.

Do this before the next spike: stop treating Claude as a hard dependency in any automated path, and put a fallback model behind a router so a 529 reroutes instead of blocking the merge. The cheapest hedge landed this week. Unsloth's day-zero GGUFs for GLM-5.2 put a 744B-parameter, 1M-context open model on a 256GB Mac at 2-bit, MIT-licensed, with no rate limit and no status page to refresh. Throughput is single-digit tokens per second, so it is a backstop, not a swap. A slow local model that answers still beats a fast remote one returning 529.

The next six months bend toward multi-provider by default. Anthropic's capacity gap does not close until the Amazon and Google buildout lands, and every outage teaches another team to wire an escape hatch it will not rip out afterward.

@newsycombinator Read source 597 engagement
Compute & Infrastructure

Canada lines up 10 reactors by 2040 with no money attached

Ottawa's first national nuclear strategy enables up to ten new large reactors, two under construction by 2035, at a cost officials peg above $100 billion. There is no new funding in the document; it points vaguely at the Canada Infrastructure Bank and Growth Fund. Power is provincial jurisdiction, so this is a baseload-for-datacenters posture play, not a build order, with nuclear still just 13% of Canadian electricity.

China's CXMT readies an IPO to break the DRAM and HBM cartel

CXMT is adding wafer capacity and lining up a public listing aimed straight at Samsung, SK Hynix, and Micron pricing power as the memory shortage bites. Memory is already the named constraint on inference economics; a fourth serious DRAM and HBM supplier changes the cost floor under every GPU buildout. Watch whether incumbents cut to defend share before CXMT's volume actually lands.

China retakes the Top500 at 2.2 exaflops on CPUs alone

LineShine dethrones El Capitan's 1.8 exaflops as the first machine to sustain over 2 exaflops of double precision without a single GPU, using a custom 304-core chip. The takeaway for anyone tracking export controls: GPU embargoes have not capped peak compute, they have pushed China toward CPU-dense architectures that sidestep the banned parts entirely.

Microsoft locks 20 years of gas to power datacenters

A two-decade gas commitment is a hyperscaler conceding that AI load outpaces the grid and renewables, so baseload gets bought on long contracts. Pair this with Canada's reactor posture and the pattern is clear: the constraint on inference scale is moving from chips to firm electrons, and the buildout is being financed years ahead of the load.

Nvidia runs rack coolant hotter than a hot tub to cut the power bill

Raising coolant temperature cuts the energy spent chilling it and claims up to 100% less water use, which lowers the operating cost of every rack. For operators, warmer-loop liquid cooling is the lever on PUE now that density is pinned by the GPUs themselves. Sustainability caveats remain, but the math favors anyone running dense Blackwell racks at scale.

Azure adds Fireworks open-weight serving to Foundry

Developers get more open-weight model options inside Azure's agent platform without leaving the stack, which matters precisely on days like today. When a primary provider rations capacity, having open-weight serving one config away inside your existing cloud is the fastest fallback path that does not touch your billing relationship.

AI & Models

A 744B open model now runs local on a 256GB Mac

Unsloth shipped day-zero GGUFs for Z.ai's GLM-5.2, a 744B-parameter MoE with 40B active and a 1M-token context under MIT. The full model needs 1.51TB on disk, but a Dynamic 2-bit build drops to 239GB at roughly 82% accuracy, putting frontier-class open inference on unified-memory hardware. Throughput sits at single-digit tokens per second on consumer gear, so treat it as a no-rate-limit backstop, not a production swap. Vendor benchmarks claim parity with Opus and GPT-5.5; none are independent yet.

Artificial Analysis ships a speech-to-speech quality index

A combined metric across Big Bench Audio, Full Duplex Bench, and Tau-Voice ranks 27 voice models on reasoning, latency, and price, giving voice-agent builders a number to argue with instead of vendor slides. The split matters: GPT-Realtime-2 leads conversational dynamics while Grok Voice Think Fast tops agentic tasks, so the right pick depends on whether you are running chat or tool use.

A 35B model tops a forecasting leaderboard against far larger rivals

Apodex-1.0-mini holds its own on FutureX, a clean signal that small models can win narrow reasoning tasks at a fraction of the serving cost. For builders, the read is to stop reaching for frontier models on bounded tasks where a 35B model serves cheaper and faster.

PlanBench-XL stress-tests agents in large tool ecosystems

The new benchmark targets long-horizon planning where agents juggle many tools, the regime where current evals miss the real failure modes. If your agent passes toy benchmarks but stalls in a 50-tool environment, this is the eval that surfaces it before your users do.

Developer Tools

Baidu open-sources an OCR model that reads a whole document in one pass

Unlimited-OCR replaces decoder attention with Reference Sliding Window Attention that holds a constant KV cache, so it transcribes dozens of pages in a single forward pass at 32K max length. It scores 93% on OmniDocBench v1.5, six points over the DeepSeek OCR baseline, at a reported 3B total and 500M active. Ignore the social claims of beating 235B models; the primary paper does not say that.

Oak pitches version control built for agents, not humans

Oak mounts large repos without a full clone, hydrates files on first access, branches one task per mount to avoid shared-.git corruption across parallel agents, and claims snapshots up to 95% faster than git plus 50% fewer VCS tokens. The benchmarks are self-reported and it sits at v0.96, but the Windows build already landed despite the wire saying otherwise. A real attempt at the multi-agent merge problem.

Armin Ronacher: the harness is the new unit of work

Ronacher argues the durable value is moving from prompting a model to writing the outer loop that supervises and re-queues agent work, keeping a task alive past where the model says it is done. It is an opinion piece with no numbers, but it names what every team building agents is converging on: the loop, not the prompt, is the product.

Google says AI Studio users built 1M native Android apps in a month

The figure signals real adoption of AI Studio's app-generation flow for mobile builders, though Google measures generation, not quality or retention. Take it as a count of attempts, not shipped products, and watch whether any of that million reaches a store.

Nvidia ships an Agent Toolkit with open Nemotron models and runtime

Enterprises get an open stack of models, tools, and a secure runtime to assemble workflow-specific agents inside Nvidia's ecosystem. It is a play to make Nvidia the default agent substrate, not just the GPU vendor, and another sign open weights plus a runtime is the shape enterprises will buy.

VibeThinker claims a 3B model beats Opus 4.5 on reasoning

The paper pairs novel SFT with GRPO to push a 3B model past a far larger model on reasoning benchmarks, the same small-model-wins-narrow-tasks pattern showing up everywhere this week. Self-reported and arxiv-fresh, so verify before you migrate, but the training recipe is the interesting part.

Latitude V2 ships open-source agent monitoring under MIT

Latitude reads 100% of traces to flag bad agent behavior, giving teams observability past raw API cost tracking, under a permissive license. On a day when a provider's overload is invisible until your jobs fail, trace-level monitoring is how you catch silent agent degradation early.

Launches & Releases

Anthropic's Claude Tag learns your company from Slack history

Claude Tag is an always-on teammate that ingests company Slack to build org context, deepening Anthropic's hold on enterprise knowledge. The product bet is sticky: once Claude holds your institutional memory, switching providers means rebuilding context from scratch. Note the timing against today's reliability run, since an always-on tool inherits the same capacity ceiling.

Startups & Capital

a16z leads a $34M Series A in home-services dispatch

Probook takes capital for vertical AI in technician routing, a market where generic agent vendors have stalled because the workflow is messy and physical. The thesis: vertical AI with domain-specific data beats horizontal agents in operations-heavy trades. Another data point that the next wave of funded AI is narrow and embedded, not chat.

HaloBraid raises $7M for a salon braiding hardware assistant

Seven Seven Six backs a device launching this year to cut six-hour braiding appointments, a niche bet on physical-task automation. Small round, specific market, but it sits in the same vertical-hardware lane as Probook: capital chasing AI that does one physical job well rather than a general assistant.

AI super PACs drop $27M on one New York local race

Industry money is flooding a single district contest to set precedent ahead of the midterms, a signal of where AI policy fights are heading. For founders, the read is that regulation is now a funded political project, and the rules your inference stack will run under are being shaped in races like this one.

Security

LastPass says hackers stole support case data in a partner breach

A breach at vendor Klue exposed LastPass customer support case data, the second LastPass-linked incident in recent years. The lesson repeats: your security posture is only as strong as your least-careful third-party vendor, and support systems hold more sensitive context than teams assume.

Disguised Russian banking apps top the US App Store, third this month

Apple's review missed three stealth financial apps in a month before pulling them, a recurring gap in store vetting. If you ship through app stores, the takeaway is that platform review is not a security layer you can rely on, for your app or your users.

Meta exposed employee keystroke logs after a permissions error

A misconfiguration left mandatory surveillance logs visible companywide, a cautionary case for any firm collecting internal data to train on. The more telemetry you hoard for AI, the larger the blast radius when a permission flips the wrong way.

Quick Hits
The Takeaway

Two stories rhyme today. Claude burned 2h37m of degraded uptime while Unsloth put a 744B open model on a desktop and Azure added Fireworks open-weight serving inside Foundry. The cheap insurance against a provider rationing capacity is a second path that already runs. Stand up GLM-5.2 behind a router this week, even at single-digit tokens per second, point a 529 at it instead of your merge gate, and measure how often your automated jobs would have rerouted over the last month. The number will be higher than you expect, and that is the case for getting off a single provider before the next peak-hour spike.

The Call C-20260624

Within 90 days, at least one top-five AI coding tool will ship automatic provider failover that reroutes off Claude on overload errors, marketed on reliability rather than price.

The case

Claude Code sits at roughly 4% of public GitHub commits while Anthropic rations capacity at peak US hours with no published root cause, and the fix through Amazon and Google is not yet online. GLM-5.2's MIT local weights and Azure's open-weight serving give tools a drop-in failover target that did not exist a month ago. A tool that loses merges to 529s has a support cost it can now engineer away, and the incentive lands before Anthropic's new capacity does.

What proves us wrong

If by September 24, 2026 no top-five AI coding tool (Cursor, Claude Code, GitHub Copilot, Windsurf, or a Cline-tier tool) has shipped documented automatic failover triggered by Anthropic overload errors, the call is wrong.

Settles by September 24, 2026
The Tape T-20260624
◆ Watch MU Micron Technology medium conviction

Micron prints fiscal Q3 tonight after a 12% single-session drop, and the variable the supercycle longs left out of the model is now named: CXMT, which SemiAnalysis pegs at roughly $50B of 2026 DRAM revenue and wafer capacity closing on Micron's own. The trade assumed no supply response. There is one.

Today's SemiAnalysis read frames CXMT's looming IPO and wafer adds as a direct threat to incumbent pricing power, with <cite index="2-25,2-26">CXMT reaching roughly 350 kwspm by end of 2026, only modestly below Micron's estimated ~385 kwspm</cite>. The detail the squeeze longs skip is where CXMT can actually compete: <cite index="2-10,2-11">its FY25 gross margin reached 37.8%, while SK Hynix sat at 60.4% on a much higher HBM mix</cite>. CXMT pressures commodity DRAM and NAND, not the HBM4 book feeding Nvidia.

Wrong if TrendForce DRAM contract ASPs keep climbing through 2H 2026 while CXMT's global bit share stalls near 9-10% into 2027, leaving incumbent commodity pricing intact. Settles settles by December 2026
▼ Short ADBE Adobe medium conviction

Mistral OCR 4 and Baidu's open long-document parser both shipped today, pushing document extraction toward zero marginal cost two weeks after Adobe's CEO conceded the point by moving Acrobat onto a freemium track. The stock sits near a seven-year low around $200, and the cash engine just took fresh fire.

Stories [11] and [5] put high-accuracy parsing in reach for free, and the target is Document Cloud, not only Firefly, which is the angle the prior creative-disruption note missed. On the June 11 print, management postponed the planned Creative Cloud price hike and expanded freemium for both Firefly and Acrobat, an open admission that open models cap pricing power. Acrobat was supposed to be the steady book funding the creative AI transition; free OCR thins that cushion.

Wrong if Adobe's fiscal Q3 report (around mid-September 2026) shows Document Cloud ARR growth reaccelerating and management reinstating the deferred Creative Cloud price increase. Settles settles by September 2026
◆ Watch 000660.KS SK Hynix low conviction

Today's global chip rout was led by Korea and dragged SK Hynix down with the tape, but it is the one memory name CXMT structurally cannot reach. Its ~60% gross margin runs on an HBM mix protected by both a technology gap and an export-control wall for years.

The same CXMT model that threatens commodity DRAM shows <cite index="2-10,2-11">SK Hynix at 60.4% FY25 gross margin against CXMT's 37.8%, the gap driven by a much higher HBM mix</cite>. SK Hynix holds roughly half the HBM market and the bulk of Nvidia's contracts, a segment <cite index="8-7,8-8">where Samsung, SK Hynix and Micron collectively hold 100% and Chinese makers like CXMT still face large technology gaps and export controls</cite>. A selloff that re-rates SK Hynix as a commodity supplier misreads where its earnings come from.

Wrong if SK Hynix HBM share drops below ~45%, or any Chinese supplier qualifies HBM into a Western AI accelerator inside the window. Settles settles by October 2026 (Q3 print)
Desk signals from the day's verified wire — falsifiable, dated, settled in public. Analysis, not individualized investment advice.

Get this briefing in your inbox

What changed in AI and compute, what it costs, and what to build. One email per week. No spam, unsubscribe anytime.