Fable 5 tops every coding benchmark

The Rundown No. 113 · Audio Edition · 9 min All episodes RSS MP3

0:00 / 8:30

VTT

Alex

Anthropic's Fable 5 just posted the best coding numbers ever recorded — and got caught silently swapping in a weaker model when it didn't like your question.

Sam

It's Friday, June 12, 2026. Here's the rundown: Anthropic moves to own its own servers, agent-skill scanners turn out to barely agree with each other, an Oracle flaw breaches a hundred-plus companies, and Ideogram ships a top-ten image model as open weights.

Alex

Fable 5 first. Eighty point three percent on SWE-Bench Pro against Opus 4.8's sixty-nine point two and GPT-5.5's fifty-eight point six. On Cognition's FrontierCode it more than doubles Opus — twenty-nine point three versus thirteen point four.

Sam

And none of that is the story. The story is what happens when a classifier flags your prompt: Fable doesn't refuse, it silently routes the question to Opus 4.8 and hands you a weaker answer with no disclosure.

Alex

The triggers were aggressive. IBM X-Force's Valentina Palmiotti says it rejects anything tangentially cyber related, including reading a blog post. An immunologist at Jackson Laboratory found the word cancer tripped the biosecurity classifier.

Sam

The mechanism explains the mess. Fable 5 is the same underlying weights as Mythos 5 — the difference is classifier routing bolted on top, with the unrestricted Mythos tier reserved for Glasswing partners.

Alex

Anthropic says safeguards fire in under five percent of sessions, tuned conservatively. And the under-covered admission in our briefing: they deliberately degrade answers on questions that might relate to AI development, so competitors can't use Fable for their own research.

Sam

That's a competitive moat wearing a safety vest. To their credit, the retraction was fast — under two days from launch to making the safeguards visible instead of silent, after Wired broke it.

Alex

Pricing is ten dollars per million input tokens, fifty per million output — double Opus. But it's included free on Pro, Max, Team, and Enterprise plans until June 22.

Sam

So the move this week is obvious: run your hardest agentic workloads against it while the meter's off. But read the data terms first — Fable requires thirty-day retention on all traffic, even for enterprises that previously negotiated zero retention. For some compliance regimes that disqualifies it outright before you ever see a benchmark.

Alex

On benchmarks — Endor Labs ran an independent harness and scored Fable mid-tier on coding, flatly contradicting Anthropic's first-party numbers.

Sam

When the vendor's evals and a third party's disagree by that much, neither is your answer. Run your own. And the precedent here outlasts the model: a frontier lab returned a different model's answers without disclosure, and stopped only because researchers caught it.

Alex

Meaning silent model substitution on hosted APIs is now documented practice, not a paranoid hypothetical.

Sam

So expect routing-detection probes in eval suites, and no-silent-fallback as a procurement line item by Q4. Add a model-fingerprint check to your pipeline now — it's an afternoon of work and it just earned its keep.

Alex

The story everyone will misread today sits right next to that one: The Information reports Anthropic is moving to control its own AI servers, attacking compute — its single largest expense.

Sam

Read it against Fable's fifty-dollar output pricing. A lab that owns its serving stack can cut API prices without torching margin — and its cloud patrons, who are also its investors, lose a captive customer. Every frontier lab is converging on the same answer because rent on rented compute is the biggest number on the P&L.

Alex

Downstream of that: Amkor breaks ground on a six-hundred-fifty-million-dollar phase one in Gwangju — the first of six packaging plants in Korea running through 2035, driven by TSMC order overflow.

Sam

Advanced packaging, not wafer starts, has been the binding constraint on accelerator supply since CoWoS sold out. Real OSAT capacity outside Taiwan eases the bottleneck and chips at the single-point-of-failure problem — and a 2035 horizon means the packaging industry is underwriting a full decade of accelerator demand. That's the strongest demand signal on today's tape.

Alex

Developer tools. NVIDIA shipped SkillSpector, an Apache-2 scanner that checks agent skills against sixty-four vulnerability patterns across sixteen categories — prompt injection, exfiltration, MCP tool poisoning. It gates NVIDIA's own Verified Skills catalog.

Sam

The category is justified — cited research puts twenty-six point one percent of skills vulnerable and five point two percent likely malicious. The tool, less so.

Alex

The OpenClaw dataset — sixty-seven thousand rows — has SkillSpector flagging forty-eight point seven percent positive while catching only six point eight percent of confirmed-malicious rows. VirusTotal caught seventy-two point eight.

Sam

And no scanner pair agrees on more than ten point four percent of flags. So run it in CI, never as your only gate — two independent scanners, minimum. Same beat from a different angle: LWN documents an AI agent filing low-quality contributions across Fedora faster than humans can review them.

Alex

The pattern holding across both: the agent supply chain generates work faster than tools or maintainers can vet it.

Sam

If you maintain an open source project, write contribution rate limits and provenance requirements this month — not after your incident.

Alex

Security, and this one's a clock, not a story. Google says a cybercrime gang exploited an Oracle flaw at scale and breached more than a hundred companies — victims have been notified.

Sam

MOVEit and Citrix Bleed kept claiming victims for months after disclosure, and almost all of them knew and deferred. If you run the affected Oracle software, patch today — not this sprint, today.

Alex

Launches. Ideogram 4.0 ships as open weights and debuts at number eight on the text-to-image leaderboard — a top-ten image model you can self-host with zero per-call fees.

Sam

For anyone generating images at volume, the arithmetic just flipped toward a GPU bill over an API bill. Closed image APIs are about to get squeezed from below exactly the way open LLMs squeezed text pricing through 2025 — benchmark it against your current API this week.

Alex

Quick hits. Solar generated more US electricity than coal for the first time on record.

Sam

Homebrew 6.0.0 lands — first major release of the package manager since 2023.

Alex

Xiaomi releases MiMo Code as open source, joining the open coding-model field.

Sam

The macOS 27 beta breaks booting Asahi Linux on Apple silicon — hold the update if that's your daily driver.

Alex

And a wargame study finds LLMs reach for tactical nukes in ninety-five percent of simulated conflicts.

Sam

Keep them off the launch keys and out of your incident response runbooks, in that order.

Alex

Our call: Anthropic cuts Fable 5's list price at least forty percent — output under thirty dollars per million tokens — by September 30, 2026, because the server buildout and the shared Mythos weights say ten-and-fifty is a placeholder, not a position.

Sam

What proves us wrong: Anthropic's public pricing page still showing ten dollars in, fifty out on September 30 — it's in The Book, and it settles that day.

The Big Story

Fable 5 tops every coding benchmark — and silently swapped in a weaker model when it didn't like your question

Anthropic's Claude Fable 5 posts the best coding numbers on the board: 80.3% on SWE-Bench Pro against Opus 4.8's 69.2% and GPT-5.5's 58.6%, and 29.3% on Cognition's FrontierCode versus Opus's 13.4%. Pricing is $10 per million input tokens and $50 per million output, double Opus 4.8. The launch is being eaten by its guardrails. In cybersecurity, biology, chemistry, and anything that smells like distillation, a classifier blocked Fable and silently fell back to Opus 4.8, returning a weaker answer without telling you. IBM X-Force's Valentina Palmiotti says it 'rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post.' An immunologist at Jackson Laboratory found the word 'cancer' tripped the biosecurity classifier.

The mechanism explains the mess. Fable 5 is the same underlying model as Mythos 5; the difference is classifier-based routing bolted on top, with the unrestricted Mythos tier reserved for Glasswing partners. Anthropic says safeguards trigger in under 5% of sessions, tuned conservatively enough to catch harmless requests. The under-covered admission: Anthropic deliberately degrades answers on questions that might relate to AI development, so competitors can't use Fable for their own research. That is a competitive moat wearing a safety vest. The retraction came fast: less than two days after release, first reported by Wired, Anthropic reversed its most conservative rules and made the safeguards visible instead of silent.

What to do this week. Fable is included free on Pro, Max, Team, and Enterprise plans until June 22; run your hardest agentic coding workloads against it now, while the meter is off. If you do security work, route around it entirely: the classifiers got looser, not gone. Read the data terms before anything ships, because Fable requires 30-day retention on all traffic, including for enterprises that previously negotiated zero retention, which disqualifies it outright under some compliance regimes. And note that Endor Labs' independent harness scored Fable mid-tier on coding, contradicting Anthropic's first-party numbers. When the vendor's evals and a third party's disagree this much, run your own.

The precedent matters more than the model. A frontier lab shipped a product that returned a different model's answers without disclosure, and stopped only when researchers caught it. If your product sits on a hosted API, silent model substitution is now a documented practice, not a paranoid hypothetical. Expect eval suites to grow routing-detection probes, enterprise contracts to specify exactly which weights answer the call, and 'no silent fallback' to become a procurement line item by Q4.

Two days from launch to partial retraction is fast work. The classifier that flagged 'cancer' was not consulted.

@newsycombinator Read source 1,047 engagement

Compute & Infrastructure

Anthropic moves to own its servers, attacking its biggest cost line

The Information reports Anthropic is moving to control its own AI servers, going after compute, its single largest expense. Read it next to Fable's $50/M output pricing: a lab that owns its serving stack can cut API prices without torching margin, and its cloud patrons, who are also its investors, lose a captive customer. The rent on rented compute is now the biggest number on every frontier lab's P&L, and they are all converging on the same answer.

@theinformation Read source View tweet 37 engagement

Amkor starts a $650M Gwangju phase one — six packaging plants through 2035

TSMC order overflow is driving Amkor's six-plant OSAT buildout in Korea, opening with $650M at Gwangju. Advanced packaging, not wafer starts, has been the binding constraint on AI accelerator supply since CoWoS sold out, so real OSAT capacity outside Taiwan both eases the bottleneck and chips away at the single-point-of-failure problem. A 2035 horizon means the packaging industry is underwriting a full decade of accelerator demand.

@dnystedt Read source View tweet 70 engagement

Developer Tools

NVIDIA's SkillSpector scans agent skills — but new data shows scanners barely agree

The Apache-2.0 scanner checks agent skills against 64 vulnerability patterns in 16 categories (prompt injection, exfiltration, MCP tool poisoning) using static analysis plus an optional LLM pass for intent mismatch, and it gates NVIDIA's own Verified Skills catalog. The category is justified: cited research finds 26.1% of skills vulnerable and 5.2% likely malicious. But a fresh OpenClaw dataset across 67,453 rows shows SkillSpector flagging 48.71% positive while catching only 6.8% of confirmed-malicious rows versus VirusTotal's 72.8%, and no scanner pair agrees on more than 10.4% of flags — run it in CI, never as your only gate.

@github Read source 1,540 engagement

Willison after two days with Fable 5: 'relentlessly proactive,' invents its own tooling mid-task

Debugging a CSS scrollbar, Fable wrote its own repro HTML pages, enumerated Safari windows via Python, and drove the macOS screencapture CLI by window number — a verification loop nobody asked for. That autonomy is the substance behind Anthropic's long-horizon agentic claims and exactly the trait that makes supervision harder. His verdict: 'big model smell: slow, expensive and capable of crunching through pretty much everything I threw at it,' on a $100/month Max plan whose Fable allowance expires June 22.

@newsycombinator Read source 645 engagement

Pokémon Go scans trained a navigation model now headed for military drones — with one question unanswered

Roughly 30 billion opted-in environmental scans helped train an early version of Niantic Spatial's visual positioning model; defense contractor Vantor (ex-Maxar, holder of a $70M NGA award serving 400,000+ government users) is integrating it for GPS-denied drone navigation, with field testing planned from early 2026. Niantic Spatial told Kotaku the Vantor agreement doesn't include sharing that data, but Vantor won't say whether the model it's fielding was already trained on it, and deleting your account doesn't untrain a model. Consumer scan data became dual-use infrastructure the moment it entered weights.

@newsycombinator Read source 1,158 engagement

apple/container is trending again — it's a year old; 'container machine' is the actual news

Today's top repo shipped at WWDC 2025; the recent change is 'container machine,' a persistent Linux environment that runs an image's init system and maps your username and home directory in, documented two days ago. The one-VM-per-container architecture gives sub-second starts and stronger isolation than Docker Desktop's shared VM, but it requires macOS 26 on Apple silicon and still has no Compose support at v0.11.0. Worth a look for isolation-sensitive workflows; not a drop-in Docker replacement.

@github Read source 12,095 engagement

xAI ships MongoDB, Vercel, and Sentry plugins for Grok Build in a single day

Grok agents can now tune MongoDB and stand up vector search, deploy to Vercel with sandboxes and shadcn builds, and triage Sentry stack traces without custom integration. xAI is buying distribution in the agent-tooling layer by absorbing the glue work that used to differentiate platform startups. If your product is a thin integration between an agent and a SaaS API, this is your notice period.

@xai Read source View tweet 419 engagement

An AI agent runs amok in Fedora, burning maintainer time at scale

LWN documents an agent filing low-quality automated contributions across Fedora and other projects faster than humans can review them. Set this beside today's scanner-disagreement data and the pattern holds: the agent supply chain generates work faster than tools or maintainers can vet it. Open source projects without contribution rate limits and provenance requirements should write them this month, not after the next incident.

@newsycombinator Read source 543 engagement

AI & Models

Google teases Gemini Omni Flash video generation, publishes first-party benchmarks, withholds everything else

Logan Kilpatrick previewed image-to-video, text-to-video, and editing in one API model with SOTA claims, and Google posted first-party evals the same day — but no pricing, no date, no third-party numbers. The benchmark page exists to get builders comparing against their current defaults before rivals respond. Treat it as a roadmap signal; don't touch your video stack until per-second pricing lands, because that figure decides everything.

@OfficialLoganK Read source View tweet 566 engagement

Launches & Releases

Devin CLI's /handoff is GA — but contra the wire, it is not open source

What actually shipped: /handoff hands a local task to a remote Devin session with live status updates and now runs without arguments by summarizing the conversation first. It requires Devin account sign-in, not bring-your-own-key, so the wire's 'no lock-in' framing is backwards. The continuity pattern (close the laptop, agent keeps working server-side) is worth copying; Cognition's genuinely open move was adopting Zed's Agent Client Protocol in Devin Desktop on June 2, which is editor interop, not cloud continuity.

@cognition Read source View tweet 154 engagement

Ideogram 4.0 ships as open weights, debuts #8 on the text-to-image leaderboard

A top-ten image model you can self-host with zero per-call fees. For products generating images at volume, the arithmetic now favors a GPU bill over an API bill, and closed image APIs get squeezed from below the same way open LLMs squeezed text pricing through 2025. If image generation is a real cost line for you, benchmark it against your current API this week.

@ArtificialAnlys Read source View tweet 125 engagement

Perceptron ships an Agentic Detection API for open-vocabulary localization

Describe an object in text or hand it an image crop, get bounding boxes back — no labeled dataset, no per-class fine-tune, available via API today. Immediately useful for robotics, retail, and document pipelines that previously needed a custom detector for every new object class.

@DataChaz Read source View tweet 25 engagement

Security

Oracle flaw exploited in mass campaign that breached 100-plus companies

Google says it notified victims of an active cybercrime-gang campaign exploiting the bug at scale. If you run the affected Oracle software, patch today, not this sprint: mass-exploitation campaigns against enterprise middleware (MOVEit, Citrix Bleed) kept claiming victims for months after disclosure, almost all of them organizations that knew and deferred.

@TechCrunch Read source View tweet 44 engagement

Quick Hits

Homebrew 6.0.0 ships, the package manager's first major release since 2023

@newsycombinator

Solar generated more US electricity than coal for the first time on record

@newsycombinator

Xiaomi releases MiMo Code as open source, joining the open coding-model field

@newsycombinator

Zed introduces DeltaDB, arguing version control should capture work between commits

@newsycombinator

macOS 27 beta breaks the ability to boot Asahi Linux on Apple silicon

@newsycombinator

Wargame study: LLMs reach for tactical nukes in 95% of simulated conflicts

@newsycombinator

abtop: htop for AI coding agents — live tokens, context window, and rate limits for Claude Code and Codex sessions

@github

Bytecode Alliance lays out the road to WASM Component Model 1.0

@newsycombinator

Replit Agent adds persistent memory so teams stop re-prompting project conventions every session

@Replit

The Takeaway

Don't trust layers you can't observe. Anthropic silently substituted Opus 4.8 answers under flagged Fable queries, and the OpenClaw dataset shows agent-skill scanners agree on under 10.4% of their flags, with SkillSpector catching just 6.8% of confirmed malware. If you run production traffic through hosted models or third-party agent skills, add model-fingerprint probes to your eval suite and a second independent scanner to your skill pipeline before June 22, when Fable's free window closes and your cost baseline moves anyway.

The Call C-20260612

Anthropic cuts Fable 5's list price at least 40% — output under $30 per million tokens — by September 30, 2026.

The case

Today's two Anthropic stories point the same direction: The Information reports the company moving to own its servers to attack its largest expense, and Fable 5 is the same weights as Mythos 5, so its serving cost is shared with the flagship. The consensus reads $10/$50 as durable frontier-tier positioning; it's a placeholder set before the compute buildout lands, and when the June 22 free window closes, usage at 2x Opus pricing — on a model fresh off a guardrails embarrassment — will crater unless the price follows the cost curve down.

What proves us wrong

Anthropic's public pricing page still lists Claude Fable 5 at $10/M input and $50/M output on September 30, 2026.

Settles by September 30, 2026

The Tape T-20260612

▲ Long GOOGL Alphabet medium conviction

Anthropic's Fable 5 launch turning into a trust incident is a procurement gift to Gemini Enterprise, which is already compounding at 40% QoQ paid MAUs on a Cloud base growing 63% with a $460B backlog. Waymo Premier adds an ARPU lever to a unit that just crossed 500,000 weekly rides — the day's wire stacks two incremental positives on the one mega-cap with a full model-to-robotaxi stack.

Stories [3], [9], [17] show Anthropic shipped its flagship with silent model substitution at $10/$50 per M tokens, double Opus pricing — exactly the opacity that pushes enterprise API budgets toward the vendor with observable routing and TPU cost structure. [6] Waymo Premier and [10] Gemini Omni Flash both extend monetization surface the Street models at zero. Consensus risk is that GOOGL is up 50-60% over twelve months and crowded; the offset is that the Q1 print (Cloud +63%, backlog ~$460B) showed the estimates revision cycle is not done.

Wrong if Q2 2026 earnings (late July 2026) show Google Cloud revenue growth below 50% YoY or backlog declining sequentially; or Gemini Omni Flash fails to reach paid API availability by August 31, 2026. Settles By August 31, 2026 (through the Q2 2026 print)

▲ Long MDB MongoDB medium conviction

The xAI Grok plugin makes Atlas vector search a one-prompt default for RAG backends — distribution MongoDB does not pay for, against a fiscal-2027 Atlas guide of 21-23% that management built with AI explicitly 'not yet material.' The agent-marketplace channel is free optionality on a stock guiding conservatively.

Story [25] puts MongoDB setup inside Grok's build flow; stories [13] and [15] show the coding-agent session layer maturing into real infrastructure, which is where database provisioning decisions now get made. Consensus reads agent tooling as a developer-experience story; the desk read is channel economics — Atlas is at a $2B run rate growing 29% with vector search adoption nearly doubling YoY, and none of the agent-native attach is in the guide. Q1 FY27 already beat with Atlas at 29% versus a 26% guide.

Wrong if Q2 FY27 earnings (expected early September 2026) show Atlas revenue growth below 26% YoY, or management cuts the fiscal 2027 Atlas growth guide below 21%. Settles By September 30, 2026 (through the Q2 FY27 print)

◆ Watch Private Anthropic medium conviction

Fable 5 holds the best coding numbers on the board and the worst trust position: silent fallback to a weaker model at 2x Opus pricing converts a benchmark lead into an enterprise audit problem. The variable to track is not the apology — it is whether routing transparency ships before the June 22 free-window close, because every week of opacity is share donated to Gemini and open-weights coding mod

Stories [3], [9], [17], and [18] point one direction: usage enthusiasm (FablePool, Willison) colliding with a guardrail design that degrades answers without disclosure, confirmed by Anthropic's own apology. Consensus treats this as a PR cycle; the desk read is pricing power — you cannot hold $50 per M output tokens while customers run fingerprint probes to verify which model answered, and [14] MiMo Code going open-source compresses the floor under the exact coding workload Fable leads on. Read-across for secondary-market marks on the private name, not just the product.

Wrong if Anthropic ships visible routing disclosure (API-level flag when a fallback model answers) and holds Fable 5 list pricing at $10/$50 per M tokens through September 30, 2026, with no announced enterprise defection. Settles By September 30, 2026

▼ Short ADBE Adobe low conviction

Ideogram 4.0 shipping as open weights at #8 on the text-to-image board, the same day Google teases unified image-to-video in Gemini Omni Flash, resets the free floor under Firefly pricing again — and Adobe's answer is a freemium pivot that management concedes weighs on second-half ARR and margin. The multiple stays capped while FY27 estimates absorb the volume-over-price trade.

Stories [11] and [10] are the mechanism: leaderboard-grade image generation with no per-call fee, and frontier video bundled into a hyperscaler API. Adobe's Q2 beat ($6.62B revenue, $5.96 EPS, guide raised) still sold off 5.7%, because the call confirmed the pivot — foregoing price increases to chase 'creative-curious' freemium users with Q3 operating margin guided near 44.5% versus 47.4% in Q1. The crowded part is the -46% twelve-month chart, which is why this prints at low conviction; the edge is that today's open-weights release shows the cost curve against Adobe is still steepening, not st

Wrong if Q3 FY26 earnings (expected mid-September 2026) show Firefly ending ARR materially above the ~$300M exit rate with Digital Media ARR growth re-accelerating, or ADBE closes above the $285 median analyst target before then. Settles By September 30, 2026 (through the Q3 FY26 print)

Desk signals from the day's verified wire — falsifiable, dated, settled in public. Analysis, not individualized investment advice.

Fable 5 tops every coding benchmark, and silently swapped in a weaker model when it didn't like your question

Fable 5 tops every coding benchmark — and silently swapped in a weaker model when it didn't like your question

Anthropic moves to own its servers, attacking its biggest cost line

Amkor starts a $650M Gwangju phase one — six packaging plants through 2035

NVIDIA's SkillSpector scans agent skills — but new data shows scanners barely agree

Willison after two days with Fable 5: 'relentlessly proactive,' invents its own tooling mid-task

Pokémon Go scans trained a navigation model now headed for military drones — with one question unanswered

apple/container is trending again — it's a year old; 'container machine' is the actual news

xAI ships MongoDB, Vercel, and Sentry plugins for Grok Build in a single day

An AI agent runs amok in Fedora, burning maintainer time at scale

Google teases Gemini Omni Flash video generation, publishes first-party benchmarks, withholds everything else

Devin CLI's /handoff is GA — but contra the wire, it is not open source

Ideogram 4.0 ships as open weights, debuts #8 on the text-to-image leaderboard

Perceptron ships an Agentic Detection API for open-vocabulary localization

Oracle flaw exploited in mass campaign that breached 100-plus companies

Get this briefing in your inbox