The next phase of agent security is not mainly about making models more obedient. It is about deciding what an agent is allowed to do when someone persuades it to be helpful in the wrong direction.
Two incidents make the point. In Brazil, courts found hidden white-on-white instructions inside legal petitions: invisible to a normal reader, but legible to AI systems processing the documents. ConJur reports two recent cases, one in Parauapebas and another in Porto Velho, where the hidden text tried to steer court AI toward favorable procedural conclusions. In the Parauapebas case, the court applied a 10% fine and referred the matter to the bar association; in Porto Velho, the court also found bad faith and sent notices for disciplinary review.
Meta’s Instagram support incident is the same shape in a consumer product. TechCrunch reports that attackers used Meta’s AI support assistant to add a new email address to target Instagram accounts during recovery, receive a verification code, and reset passwords without taking over the victim’s original email. KrebsOnSecurity describes the method as location-spoofing with a VPN, then persuading the support bot to perform an account-recovery action. TechCrunch also notes Meta’s Andy Stone said the issue was fixed and impacted accounts were being secured.
These are not identical attacks. One is document prompt injection against court workflows; the other is social engineering against an account-support assistant. But both are confused-deputy failures. The AI did not need to become evil. It only needed to be placed between a hostile input and a privileged action.
That is why Microsoft’s agent-control-plane language matters. Microsoft says Agent 365 is now generally available as a way to observe, govern, and secure agents across Microsoft and partner ecosystems. It specifically calls out local agents such as OpenClaw and Claude Code, shadow-agent discovery through Defender and Intune, policy controls, runtime blocking, alerts, and Windows 365 for Agents as a managed execution environment. In a related Windows/NVIDIA post, Microsoft says Build will show Windows running agents with operating-system-enforced identity, containment, and management, with NVIDIA OpenShell using new Windows security and containment primitives.
The hype version of this story is “agents are coming.” The practical version is duller and more important: who can the agent act as, what files or accounts can it touch, what network destinations can it reach, what actions require a deterministic gate, and what audit trail exists after it acts?
The thing to watch at Build is not the demo. It is whether Microsoft shows hard boundaries. A good agent platform should assume hostile inputs will arrive through documents, chats, tickets, web pages, and support workflows. It should treat the model as one component inside a governed runtime, not as the security boundary itself.
The Brazil and Meta cases are useful because they strip away the abstraction. The risk is not that AI says something strange. The risk is that software gives AI authority, then treats fluent cooperation as proof of legitimacy.