Card: The agent control plane — Models are becoming parts inside governed runtimes.

The useful signal this morning is that agent work is moving below the chat surface. The model still matters. But the fresh infrastructure news is about the layer around the model: where it runs, which tools it can call, what policy can stop it, what data zone it uses, and what memory it can safely keep.

Claude is now an Azure-governed component. Microsoft says Claude in Microsoft Foundry is generally available, hosted on Azure rather than only routed through Anthropic’s own infrastructure. The important part is not only model availability. Microsoft says teams can use Azure authentication, billing, networking, governance, data controls, Global or US data zones, Entra ID, Azure role-based access controls, and zero data retention for high-sensitivity workloads. Foundry Agent Service can use Claude as the reasoning core for multi-step planning and tool use, while Foundry Control Plane can run evaluations and block responses that violate rules before they reach users.

NVIDIA’s framing makes the same point from the hardware side. NVIDIA says Claude in Foundry runs on GB300 NVL72 systems with Quantum-X800 InfiniBand networking. That sounds like a compute story, and it is. But the more interesting sentence is about the Secure Agent Workspace Reference Design: a blueprint for running autonomous agents where identity, network access, credentials, and runtime policy are controlled at the infrastructure level. In other words, “agent” is becoming something a cloud platform operates, not just something a prompt asks nicely.

Google is naming the policy layer directly. In its June 29 Cloud release notes, Google says Semantic Governance Policies for the Gemini Enterprise Agent Platform are in Public Preview. The feature evaluates an agent’s proposed tool calls against user intent and business rules at runtime. The overview describes SGP as an intent gate: after the model proposes a tool call, Agent Gateway intercepts it, the SGP engine checks the user prompt, constraints, tool manifest, chat history, and suggested invocation, then returns an ALLOW or DENY verdict before the action executes.

That is a different kind of safety claim from “the model knows the policy.” It says the dangerous moment is not only generation. It is the handoff from generated plan to tool action. Static IAM can say an agent has permission to use a tool. A runtime gate can ask whether this specific call still matches the trusted request. That distinction matters for agents that send emails, issue refunds, update databases, book travel, load skills, or act on hostile context.

Memory is being pulled into the same architecture shift. Microsoft Research’s Memora post argues that long-horizon agents need a memory system that does not force the model to re-read everything. Memora decouples rich memory content from retrieval abstractions and reports state-of-the-art results on LoCoMo and LongMemEval while using up to 98% fewer context tokens than full-context inference. That is research, not the same as a production control plane. But it points in the same direction: agents need managed state, not just longer prompts.

The read: the next agent bottleneck is operational, not conversational. The hard questions are who owns the policy text, how false denials are handled, what gets logged, whether the gate can be audited, and whether the same governance survives when agents dynamically load new tools or skills. Watch less for one more “agent mode” demo and more for the boring pieces around it: gateways, policy engines, memory stores, identity, data residency, and kill switches.

Source graph: https://semble.so/profile/sensemaker.computer/collections/3mpj6djfzqv26