← All essays

The AI Capability Control Plane

Why AI infrastructure is moving from token delivery to governed execution — and which layer decides what intelligence runs, for whom, with what data, under which policy.

A market map plotting AI infrastructure players across model-routing ownership and capability governance, with an opportunity zone in the top right where Lumos sits

The first AI-infrastructure wave was about training scale. The next wave is about production control — routing, runtime, identity, safety, policy, observability, audit, retention, tool permissions, and the cost of a whole completed workflow.

The core call

The model is no longer the whole product. The runtime is.

The next durable AI-infra control plane will not only answer “which model should run?” It will answer: “should this user or agent, in this workflow, with this data, be allowed to access this level of intelligence under this policy?”

Old question

How do we serve tokens cheaply?

The inference-optimization layer: KV cache, batching, latency, throughput, GPU scheduling, quantization, and model hosting.

New question

How do we govern capability?

The governed-execution layer: identity, semantic risk classification, capability-tier authorization, fallback, audit, retention, and tool/action policy.

Why now

Agents turn text into action.

Once models use tools, memory, code execution, internal data, and long-running state, safety and access become runtime-engineering problems — not just model or app-layer ones.

From tokens to workflows

AI infra is splitting into two layers

The first layer is inference optimization: serving tokens faster and cheaper — KV cache, batching, utilization, quantization, latency, throughput, memory bandwidth, GPU scheduling, prefill/decode optimization, and model hosting.

The second layer is governed execution: deciding whether a capability should run at all, under what permissions, with what safeguards, and with which fallback. This layer matters once models become agents — long-running, tool-using, stateful systems that act on data and workflows.

Diagram 1 — The two-layer split

1 · Inference optimization

Goal: serve tokens faster and cheaper.

  • KV cache
  • Batching / utilization
  • Latency / throughput
  • GPU scheduling
  • Quantization / hosting
“How do we serve tokens efficiently?”
Players: silicon vendors, hyperscalers, inference platforms, model gateways, serving software.

2 · Governed execution

Goal: decide whether and how capability should run.

  • Identity + workflow context
  • Semantic risk classification
  • Capability-tier authorization
  • Fallback / approval / audit
  • Tool permissions + retention
“Should this user or agent access this level of intelligence — for this workflow, with this data, under this policy?”
Players: model labs, AI gateways, agent-security platforms, enterprise governance tools, emerging control-plane products.

This is the shift from cost per token to cost and control per completed workflow. A coding agent doesn’t make one model call. It searches a codebase, plans changes, edits files, runs tests, handles failures, retries, and produces a reviewable output. A clinical or legal agent doesn’t simply produce text — it touches regulated data, invokes tools, follows approval paths, and leaves an audit trail.

The trigger event

Anthropic’s Fable/Mythos launch made the pattern visible

Anthropic’s Claude Fable 5 / Claude Mythos 5 launch is the clearest public example of capability governance. Anthropic describes Fable 5 as a Mythos-class model made safe for general use, while Mythos 5 is the same underlying model with safeguards lifted in some areas for a small group of trusted users. For some cybersecurity, biology/chemistry, and distillation-related requests, Fable 5 falls back to Claude Opus 4.8 instead of the higher-capability Mythos-class model.1

Same underlying model. Different access tier, safeguards, fallback behavior, retention policy, and trusted-access path. That isn’t just model-release strategy — that’s infrastructure.

What most people will see

  • Better coding and reasoning benchmarks
  • Longer autonomous work
  • Stronger vision and scientific capabilities
  • Higher-value use across cyber, software, and biology

What the infra lens sees

  • Capability tiers, not one universal model surface
  • Risk classifiers deciding access at runtime
  • Fallback routing to a safer / lower-capability model
  • Trusted-access programs for high-risk domains
  • Retention and audit as part of product design

Why the details matter

Anthropic says Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, and that the fallback behavior triggers in under 5% of sessions on average. It also says it will require 30-day retention for traffic on Mythos-class models to detect complex attacks, novel jailbreaks, and patterns that operate across many requests.1

Those are product details, but they imply a deeper architecture: request classification, capability gating, model fallback, retention, auditability, and trusted access. The release isn’t just about serving a smarter model — it’s about deciding when a higher-capability model is allowed to run.

Diagram 2 — Old world vs. new world

Old world: prompt → model

  1. User sends a prompt.
  2. App chooses a model.
  3. Model responds.

Optimizes for: quality, latency, cost.

Control surface: basic moderation / prompt filtering.

Mental model: “Which model should I call?”

New world: capability-governed execution

  1. User or agent sends a request.
  2. Identity + workflow context attached.
  3. Risk / domain classified.
  4. Policy decides the capability tier.
  5. Routed to frontier model, safer fallback, cheaper model — or blocked.
  6. Tools, actions, logging, and retention governed.

Mental model: “Should this capability run for this workflow, under this policy?”

The category boundary

OpenRouter is close. Traditional IAM is adjacent. The missing layer is capability access.

The natural question is whether this is simply model routing or traditional access control. It’s related to both — but identical to neither.

Traditional IAM / IGA

Governs resource access.

Helps organizations manage who can reach which applications, systems, and data. An important input to the AI control plane — but it usually doesn’t understand model capability, semantic risk, tool actions, or multi-turn agent behavior.

Asks: “Who has access to what resource?”

Model routing / gateways

Routes model & provider access.

Platforms like OpenRouter abstract many models and providers behind one interface and support routing, fallbacks, provider policies, privacy settings, budgets, and guardrails.2

Asks: “Which model or provider should handle this request?”

Capability control plane

Governs capability access.

Combines identity, workflow context, semantic risk, model routing, retention, tool permissions, approvals, and multi-turn adversarial detection.

Asks: “Should this user/agent access this level of intelligence for this task?”

The crisp distinction

Traditional access control governs resources. Model routers govern model selection. The missing layer governs capability access.

It would integrate with identity tools, model gateways, and AI-security products — but its core value is AI-native: semantic risk classification, capability-tier authorization, model fallback, tool/action permissions, retention/audit, and multi-turn adversarial detection.

System design

Reference architecture: the AI capability control plane

Architecturally, the missing layer is not a prompt filter. It is an execution control plane that sits between users/agents, identity systems, AI-security tools, model routers, tool runtimes, and audit systems.

Diagram 3 — Request flow through a capability control plane
01

User / agent request

Prompt, files, memory, tool intent, objective.

02

Identity + context

Who is asking? Role, org, customer, workflow, data sensitivity.

03

Semantic risk

Cyber? Bio? Finance? Code execution? Distillation? Adversarial?

04

Policy engine

Allowed? Needs fallback? Approval? Retention?

05

Routing engine

Frontier, safer fallback, cheaper, self-hosted — or block.

06

Execution + output

Result, tool calls, actions, audit trail, session replay.

Identity / access

Users, roles, entitlements, app/resource permissions.

AI security + guardrails

Prompt defense, data-leak prevention, policy enforcement.

Model routing / gateway

Provider abstraction, fallback, latency/cost routing.

The four product layers

1. Identity + context

The system needs the user, organization, role, customer, workflow, data classification, tool permissions, and autonomy level.

2. Semantic risk classification

It must classify intent and risk across prompts, retrieved data, tool calls, outputs, and multi-turn behavior.

3. Routing + action policy

It decides whether to allow the frontier model, route to fallback, require approval, redact data, block, or restrict tool use.

4. Audit + observability

It reconstructs why a model was chosen, what data was sent, what tools were called, what policy fired, and what the agent did.

Landscape

Everyone is doing a piece. No one cleanly owns the layer.

The category is fragmented across model routers, AI gateways, enterprise governance tools, LLM firewalls, runtime-protection vendors, DLP, observability, and evals. The market isn’t empty — but the full, independent capability-governance layer is still not cleanly owned.

Diagram 4 — Market map · routing ownership × capability governance
Governance / capability control →
Opportunity zone
1
2
3
4
5
6
7
8
9
Lumos
Model / routing ownership →
1Traditional IAM / IGA — governs which resources a user can reach
2Lakera — LLM firewall: prompt-injection & jailbreak defense
3Prompt Security — runtime GenAI security & data-leak prevention
4Noma — runtime agent security across tools & MCP
5Credal — enterprise-agent governance, permission-aware data
6Portkey / Palo Alto — AI gateway becoming an agent control plane
7Cloudflare AI Gateway — network gateway: caching, routing, DLP
8OpenRouter — model/provider routing, budgets, ZDR
9Anthropic Fable/Mythos — model lab shipping capability tiers
Lumos — the independent capability control plane (this thesis)

Where Lumos fits: the top-right. High capability-governance and native model routing — the one corner of the map no incumbent owns outright.

Routers climb in from the right; guardrail and governance vendors climb in from the left; model labs hold the high-governance ceiling, but only for their own models. Lumos is positioned as the independent control plane that spans both axes — identity- and workflow-aware governance with first-class routing and fallback — rather than a guardrail bolted onto a single provider.

Lumos ◆

The independent capability control plane this memo describes: identity + workflow context, semantic risk classification, capability-tier authorization, model routing & fallback, tool/action permissions, and retention/audit — across providers, not locked to one.

OpenRouter

Closest public comp for model/provider routing. Its docs show controls for budgets, model/provider restrictions, privacy policies, zero-data-retention, prompt-injection defense, and sensitive-information handling.2

Cloudflare AI Gateway

Network/infrastructure gateway with caching, rate limiting, guardrails, dynamic routing, and data-loss prevention.3

Portkey / Palo Alto

Validation that AI gateways are becoming enterprise security control planes — Palo Alto framed Portkey as helping monitor, orchestrate, and govern autonomous agents at scale.4

Credal

Close on enterprise-agent governance: permission-aware data access, policy controls, deployment, and audit logging for internal agents.5

Lakera

AI firewall / guardrail surface: prompt defense, jailbreak and injection detection, content moderation, and data-leak protection.6

Noma + Prompt Security

Runtime agent-security surface: prompts, responses, tool calls, MCP interactions, agent-to-agent communication, and runtime enforcement.7, 8

Implications

The investable wedge isn’t “guardrails.” It’s policy-aware execution.

A simple prompt filter is too narrow. The more interesting wedge is a policy-aware execution layer for high-capability agentic workflows.

Diagram 5 — The AI-infra stack, with the capability-control wedge
Layer 5Application demand
Coding agents, healthcare, legal, support, finance, security, sales, internal automation.
Workflows shape inference demand.
Layer 4Agent runtime
State, memory, tool use, durable execution, evals, replay, human review, rollback.
The workload control surface.
Layer 3Routing + governance
Model routing, policy, identity, risk classification, fallback, retention, audit.
The capability control plane.
Layer 2Managed inference
Serving models reliably, scaling, observability, cost controls, VPC / self-hosting.
Inference platforms & serving.
Layer 1Silicon + systems
GPUs, accelerators, hyperscaler silicon, memory, interconnect, scheduling.
Compute substrate.

Where this matters first

Cybersecurity

The same capability can support defenders or accelerate attackers. The control layer has to distinguish defensive workflows from offensive misuse and govern tools/actions accordingly.

Life sciences

Scientific models can accelerate research while creating dual-use risk. Trusted access, audit, and domain-specific policy become product requirements.

Coding agents

Code agents need repo context, sandboxing, test execution, tool permissions, rollback, and human review. The model is only one part of the runtime.

Healthcare / legal

Sensitive data, compliance, retention, explainability, and human approval make generic routing insufficient.

Finance

Data sensitivity, regulated advice, audit requirements, and model-selection policy make governance inseparable from deployment.

Enterprise agents

Agents with internal tools and write access need contextual authorization — not just static app permissions.

Diligence

Questions that separate a platform from a prompt filter

For companies building this layer

  • Can you show the full inference and tool-call graph?
  • Where do you attach identity, role, customer, and workflow context?
  • Can policies vary by user, data sensitivity, customer, geography, and workflow?
  • Can you route to a safer model instead of simply blocking?
  • Can you detect multi-turn jailbreaks, or only single-turn attacks?
  • Can you govern tool calls and write actions, not just prompts?
  • Can you reconstruct what an agent did across an entire session?

For AI application companies

  • What is your cost per completed workflow, not per API call?
  • Which model calls are latency-sensitive vs. batchable?
  • Where do retries explode?
  • Which data is too sensitive to leave your environment?
  • Where does human approval enter the loop?
  • What actions can the agent take, and who owns the audit trail?
  • Which part of the workflow would you pay to never think about again?

What could make the thesis wrong?

  • Model labs absorb the whole layer. Each frontier provider may build its own capability governance, and customers may accept provider-specific controls.
  • Existing gateways expand fast enough. Model routers, AI gateways, and security platforms may converge on this layer before a standalone category forms.
  • Agents stay mostly read-only. Capability governance matters most when agents take actions. If usage stays read-only, lighter guardrails may suffice for longer.
  • Security incumbents win distribution. The layer may become a module inside large identity, network, or security platforms rather than an independent company.
The thesis still points in the right direction: the more capable models become, the more valuable the runtime around them becomes.
Concise positioning

How to say it in one paragraph

Model routing is becoming strategic, but routing alone isn’t the whole category. Anthropic’s Fable/Mythos launch points at a deeper AI-native layer: capability governance. The missing product isn’t just “who can access this app?” or “which model should handle this request?” It is “should this user or agent, in this workflow, with this data, be allowed to access this level of intelligence — and if so, under what routing, retention, audit, fallback, and tool-action policy?”

Three takeaways

1/ The Fable/Mythos launch is a model-routing announcement disguised as a model launch. Same underlying model, different access tier, safeguards, fallback behavior, retention, and trusted access. That isn’t only release strategy — it’s runtime strategy.
2/ Traditional access control governs resources. Model routers govern model selection. The emerging gap is governed capability access: which user or agent can invoke which level of model capability, for which workflow, with which data, under which policy.
3/ The next AI-infra moat may not be “serve tokens faster.” It may be “control when intelligence is allowed to act.” Once agents get tools, memory, and write access, access control becomes semantic, contextual, and runtime-native.
Source notes

References and research base

  1. Anthropic, “Claude Fable 5 and Claude Mythos 5,” June 9, 2026. Used for the same-model distinction, fallback to Opus 4.8, the under-5% fallback statement, pricing, trusted access, safety classifiers, 30-day retention, and capability examples. Source.
  2. OpenRouter documentation — Guardrails, Provider Routing, and Zero Data Retention. Used for model/provider restrictions, spending limits, data policies, privacy settings, and routing behavior. Guardrails, Provider routing, ZDR.
  3. Cloudflare AI Gateway docs. Used for caching, rate limiting, guardrails, dynamic routing, and data-loss prevention. Source.
  4. Palo Alto Networks, “Palo Alto Networks Completes Acquisition of Portkey to Secure AI Agents.” Used for the AI-gateway / autonomous-agent control-plane framing. Source.
  5. Credal homepage and security pages. Used for enterprise-agent governance, permission-aware data access, audit logging, and policy enforcement. Homepage, Security.
  6. Lakera Guard documentation. Used for prompt defense, jailbreak/prompt-injection detection, content moderation, and data-leak framing. Source.
  7. Noma Security, “AI Runtime Protection.” Used for monitoring prompts, responses, tool calls, MCP-server interactions, and agent-to-agent communications. Source.
  8. SentinelOne, “SentinelOne to Acquire Prompt Security.” Used for Prompt Security as runtime GenAI security, data-leak prevention, prompt-injection enforcement, and agent protection. Source.

Source-quality note

This memo combines the Anthropic Fable/Mythos article and the inference-infrastructure thesis with primary or official sources from Anthropic, OpenRouter, Cloudflare, Palo Alto Networks, Credal, Lakera, Noma, and SentinelOne. Company-positioning claims — including market-map placement — should be read as a thesis map rather than a diligence conclusion; each vendor’s actual product depth would need customer calls and technical testing.

Follow the wagers

Every daily briefing closes with a falsifiable call. Read today's, or get the week's signal in your inbox.

Read the Daily Briefing