Agentic AI’s Attack Surface Is Already Exploited: What the 2025 Incident Record Tells Enterprise Security Teams

7 min read
Key Takeaways
  • Confirmed production exploits in 2025 include EchoLeak (CVE-2025-32711, CVSS 9.3), which achieved zero-click data exfiltration through Microsoft 365 Copilot with no user interaction required.
  • The MCP supply chain attack surface is unpatrolled: malicious packages have backdoored 437,000 npm installations and exfiltrated data through WhatsApp and GitHub integrations.
  • IBM’s 2025 breach data puts shadow AI incidents at $4.63 million average cost — $670,000 above a conventional breach — because conventional detection tools have no visibility into agent decisions.
  • Least-privilege scoping, MCP server SBOMs, and prompt injection detection at ingestion are the three controls enterprise security teams can implement now.

Key Claim: Agentic AI security is no longer a theoretical risk — the 2025 incident record includes confirmed CVSS 9.3 exploits and a documented MCP supply chain breach timeline that enterprise security tooling was not built to detect.

On 29 September 2025, Koi Security researchers flagged the first confirmed malicious Model Context Protocol server in the wild. A rogue npm package named postmark-mcp — a near-identical copy of Postmark Labs’ official library — had spent two weeks silently BCC’ing every email sent through it to an attacker-controlled address. It accumulated 1,643 downloads before removal. The attacker’s tool: one line of code. The victim’s visibility: zero.

That incident is not an outlier. It is one entry in a documented sequence of confirmed attacks across 2025 that collectively define a new category of enterprise exposure — one for which conventional endpoint detection, DLP, and WAF tooling has no adequate answer.

The Attack Surface Agentic Systems Create

Traditional LLM security focused on what a model would say. Agentic AI security concerns what a model will do. An agent equipped with tool-calling capabilities — web browsing, code execution, email dispatch, API access, file system operations — operates with a trust level and an action radius that exceeds most human users on the same network.

The attack surface has three distinct layers. First, the model itself can be manipulated through its inputs. Second, the tools the model calls can be compromised at the source. Third, the agent’s permissions — the scope of what it can access and act on — are routinely misconfigured far above what any given task requires.

OWASP’s GenAI Security Project, drawing on more than 100 security researchers, published a dedicated Top 10 for Agentic Applications in December 2025, separate from its existing LLM Top 10. The list formalises what practitioners had already encountered: prompt injection (AG01), memory poisoning (AG03), tool and plugin misuse (AG04), and supply chain vulnerabilities (AG09) represent the leading risk categories. Prompt injection tops both the new Agentic Top 10 and the existing LLM Top 10 — it is, per OWASP’s own framing, the root vulnerability from which most other agentic exploits branch.

Documented Incidents: Prompt Injection Moves to Production

The theoretical risk of prompt injection has been well-documented since 2023. What 2025 added was a confirmed incident record against production enterprise systems.

CVE-2025-32711, disclosed by Aim Security and dubbed EchoLeak, is the clearest illustration of the severity ceiling. The vulnerability, rated CVSS 9.3, exploited Microsoft 365 Copilot’s document processing to achieve zero-click data exfiltration. An attacker sends an email to the target’s Outlook inbox containing hidden adversarial instructions — no interaction required on the victim’s side. When Copilot processes the email (to summarise, triage, or analyse it), it follows the embedded instructions, autonomously exfiltrating data from OneDrive, SharePoint, and Teams to attacker-controlled infrastructure. According to Hack The Box’s published analysis, the attack chained four bypasses: evasion of Microsoft’s XPIA classifier, link redaction bypass via reference-style Markdown, exploitation of auto-fetched images, and abuse of a Teams content security policy allowlist. Microsoft patched server-side in June 2025. No client-side action was required.

CVE-2025-53773, patched in Microsoft’s August 2025 Patch Tuesday, demonstrated a complementary vector through developer tooling. Researchers at Embrace The Red documented the attack chain: malicious instructions embedded in a public GitHub issue or README cause GitHub Copilot’s VS Code agent to write “chat.tools.autoApprove”: true to .vscode/settings.json. That setting disables all user confirmation prompts for Copilot’s tool calls. With that gate removed, a subsequent conditional injection issues terminal commands — achieving remote code execution on the developer’s machine across Windows, macOS, and Linux. The researchers noted the core architectural flaw: “the edits are immediately persistent, they are not in-memory as a diff to review.”

Palo Alto Networks Unit 42’s December 2025 telemetry on web-based indirect prompt injection documented 12 confirmed real-world attack cases and characterised the payload landscape: 37.8% of observed injections used visible plaintext, 19.8% used HTML attribute cloaking, and 16.9% used CSS rendering suppression. Data destruction attempts represented 14.2% of observed attacker intent — not just data theft, active deletion.

The MCP Supply Chain: An Unpatrolled Dependency Layer

Model Context Protocol servers are the connective tissue between AI agents and the tools they use — file systems, databases, APIs, communication platforms. Anthropic open-sourced the protocol in November 2024; adoption accelerated rapidly through 2025 as developers published MCP servers to npm and other registries with no vetting process comparable to what major package ecosystems apply to general software.

The breach timeline through 2025 is precise. In April 2025, a malicious MCP server exfiltrated WhatsApp chat history through tool poisoning — manipulating tool descriptions to redirect output to attacker-controlled numbers, per authzed.com’s published breach timeline. In May 2025, a prompt injection via GitHub MCP’s public issues caused private repository contents to leak into public pull requests due to overly broad Personal Access Token scopes. In June 2025, Anthropic’s own MCP Inspector developer tool contained an unauthenticated remote code execution flaw, exposing filesystems, API keys, and environment secrets on developer machines.

July 2025 brought CVE-2025-6514 in the mcp-remote package — a CVSS 9.6 OS command injection vulnerability affecting 437,000 installations. In September 2025, the postmark-mcp backdoor demonstrated supply chain compromise at the package level. October 2025 added the Smithery path-traversal incident: a single build configuration flaw exposed a Fly.io API token granting control over 3,000+ MCP server deployments, capturing inbound traffic and downstream API secrets.

The structural problem is that MCP servers operate with the trust level of the agent that invokes them — which is often significantly elevated. A compromised MCP server can request and receive permissions the legitimate application never needed. Unlike SaaS integrations, which are reviewed by security and legal teams, MCP servers are frequently installed by individual developers and immediately granted access to production credentials.

Privilege Escalation and the Confused Deputy Problem

Beyond supply chain, the configuration failures in agentic deployments create what security literature calls the “confused deputy” problem: a less-privileged party manipulates a more-privileged intermediary into acting on its behalf.

The mechanics in agentic systems are direct. An agent given access to an email system, a database, and a file server for legitimate productivity tasks holds more combined access than any of the individual employees it serves. When an attacker manipulates that agent via an indirect prompt injection — a poisoned document it is asked to summarise, a malicious website it browses during a research task — the attacker inherits the agent’s full permission set, not their own.

IBM’s 2025 Cost of a Data Breach Report quantified the outcome: shadow AI breaches — those involving AI agents operating outside formal governance — averaged $4.63 million per incident, $670,000 more than a conventional breach, per Reco AI’s 2025 year-in-review analysis. The cost differential reflects the difficulty of detection: conventional DLP and endpoint tools have no visibility into what an AI agent decided to do with the data it accessed.

NIST’s updated AI 100-2 E2025 (Adversarial Machine Learning), published in 2025, is the first edition of that taxonomy to explicitly address agentic systems — the 2023 version did not. It covers prompt injection, RAG-targeting attacks, and supply chain risk as distinct categories, per NIST’s publication record. That addition signals that federal guidance is catching up to what attackers already demonstrated in the wild.

What Enterprise Security Teams Need to Do Now

Current controls are insufficient because they were designed for human actors. The required response operates at three levels.

Least privilege for agents, not just users. Agents must be scoped to precisely the permissions their current task requires, not the permissions they might ever need. This is harder than user RBAC because agents are often deployed with long-lived credentials to avoid re-authentication overhead. NVIDIA’s published sandboxing guidance for agentic workflows recommends OS-level isolation — macOS Seatbelt, Linux seccomp/namespaces — network egress whitelisting, and blocking filesystem writes outside defined workspaces. These are not AI-specific controls; they are the same principles applied to containerised workloads, applied one layer deeper.

MCP server governance as a supply chain problem. Every MCP server in use is a dependency that requires the same treatment as an open-source library: inventoried, version-locked, signature-verified, and monitored for upstream changes. The Databricks AI Security Framework (DASF v3.0) added Agentic AI as its 13th component, explicitly covering MCP threats alongside memory and planning risks, per the Databricks blog. SBOMs for MCP servers are not yet standard practice; they should be.

Prompt injection detection at ingestion, not output. Microsoft’s July 2025 MSRC guidance introduced “Spotlighting” — using delimiting, datamarking, or encoding to help LLMs distinguish trusted instructions from untrusted external content — alongside Microsoft Prompt Shields, a classifier that detects injection attempts before they reach the model. Per Microsoft’s published guidance, this is a detection layer, not a prevention guarantee. Microsoft released the Agent Governance Toolkit on 2 April 2026, addressing all 10 OWASP agentic risks through runtime monitoring, per the Microsoft Open Source blog.

On the compliance side: neither SOC 2 nor ISO 27001 contains control categories specific to AI agents. ISO 42001 (AI Management Systems) is the emerging standard, alongside NIST AI RMF. Gartner projects 60% of organisations will have formalised AI governance programs by 2026, per penligent.ai’s compliance analysis. Security teams should not wait for auditors to mandate agent-specific controls — the 2025 incident record makes the case.

What to Watch

Human-in-the-loop requirements for high-risk tool calls. The pattern in EchoLeak and CVE-2025-53773 is autonomous execution — no human in the approval chain. Requiring explicit user confirmation before agents send emails, execute code, or modify configuration files is an architectural decision that needs to be made at deployment time, not patched in after an incident.

Registry vetting for MCP servers. The MCP ecosystem currently has no equivalent of npm’s security audit tooling or PyPI’s malware scanning. The September 2025 postmark-mcp incident is the first; it will not be the last. Watch for whether Anthropic, npm, or third-party security vendors introduce signing and provenance requirements for MCP packages.

Regulatory movement on agentic AI. The EU AI Act’s high-risk system categories do not explicitly cover agentic deployments; enforcement guidance for 2026 may change that. NIST’s AI RMF is being adopted by US federal contractors as a procurement requirement. Enterprise security teams negotiating AI vendor contracts in 2026 should require vendors to document their agentic security posture against the OWASP Agentic Top 10 as a baseline.

Agent identity and audit logging. If an AI agent exfiltrates data, current SIEM tooling may not distinguish agent activity from user activity — both appear as API calls under the same credential. Agent-specific identity tokens and structured audit trails for every tool invocation are the foundational observability requirement that most deployments have not yet implemented.

Further Reading

This article was produced with AI assistance and reviewed by the editorial team.

Arjun Mehta, AI infrastructure and semiconductors correspondent at Next Waves Insight

About Arjun Mehta

Arjun Mehta covers AI compute infrastructure, semiconductor supply chains, and the hardware economics driving the next wave of AI. He has a background in electrical engineering and spent five years in process integration at a leading semiconductor foundry before moving into technology analysis. He tracks arXiv pre-prints, IEEE publications, and foundry filings to surface developments before they reach the mainstream press.

Meet the team →
Share: 𝕏 in
The NextWave SignalSubscribe free

The NextWave Signal

Enjoyed this analysis?

One AI market analysis + one emerging-tech signal, every Tuesday and Friday — written for engineers, PMs, and CTOs tracking what shifts before it goes mainstream.

Leave a Comment