Connecting an MCP server to an AI agent feels as casual as installing a browser extension: paste a config block, restart the client, done. That casualness is the problem. When you connect a Model Context Protocol server, you are not adding a feature — you are granting an autonomous agent a new set of capabilities and, often, a new set of credentials, and you are extending your trust to whoever wrote and operates that server.
This is not a fear piece. MCP is not uniquely unsafe. It is an open standard built by Anthropic in late 2024, now governed under the Linux Foundation's Agentic AI Foundation, with a published security best-practices section and a real OAuth 2.1 authorization model. Most of the documented MCP incidents are not protocol flaws at all — they are old application-security problems (untrusted input, over-broad credentials, unvetted code) wearing a new coat. But the agentic context amplifies them, because the thing acting on the malicious input is an LLM with tools, a credential wallet, and the autonomy to chain actions together. This post is the threat model we use at InventiveHQ when a client wants to roll MCP out across a team.
The trust model: you are trusting every server you connect
Start here, because every risk below is a corollary. In MCP's architecture there are three roles: a host (the LLM application — an IDE, a chat app, an agent), a client (a connector inside the host holding a 1:1 stateful connection to one server), and a server (the thing exposing capabilities). The wire format is JSON-RPC 2.0, and a server advertises three primitives to the model: tools (functions the model can invoke), resources (data the model can read), and prompts (templated workflows).
The security-relevant fact is what a server can put in front of the model. Tool descriptions are read by the model to decide what to call. Tool results are fed back into the model's context as it reasons. Both are attacker-controllable if the server is malicious or compromised. So the trust boundary isn't "did I type the config correctly" — it's "do I trust this server's code, its operators, its dependencies, and everything it will ever return, for as long as it stays connected." If you wouldn't give a stranger's binary a shell on your laptop and your OAuth tokens, think twice before connecting their MCP server.
It helps to remember that MCP doesn't replace function calling — it feeds it. The model still emits a structured call against a JSON Schema; MCP just standardizes how that tool was discovered and connected. Every classic tool-use risk still applies, plus the new discovery and connection surface MCP adds on top.
The risk catalogue
Each row below is a distinct risk class. Where there's a publicly verified incident or CVE we name it; where the class is real but we couldn't confirm a specific named incident, we describe it generically and say so.
| Risk | What it is | Mitigation |
|---|---|---|
| Tool poisoning | Malicious or hidden instructions embedded in a tool's description or schema fields — visible to the model, usually not to the user. The model silently obeys (exfiltrate files, leak secrets) while returning normal output. Disclosed by Invariant Labs, April 2025. | Treat descriptions as untrusted; scan with mcp-scan; pin and diff descriptions; render full tool metadata to users. |
| Prompt injection via tool results | Untrusted data a tool returns (a GitHub issue, web page, file, email) carries instructions that hijack the agent — indirect prompt injection. | Treat all tool output as untrusted data; provenance/tainting; output filtering; don't auto-act on retrieved content; human approval for high-impact actions. |
| Confused deputy | The server holds broad ambient authority (one OAuth token) and is tricked into using it for the attacker — e.g. jumping from a public repo to a private one on the same token. Demonstrated against the GitHub MCP server (Invariant Labs, May 26 2025). | Per-resource scoping; token audience/resource binding; separate credentials per boundary; deny cross-context access. |
| OAuth token passthrough | A server accepts or forwards tokens not issued for it, or stores long-lived tokens, enabling replay and over-reach. | Validate token audience; never pass tokens through; short-lived/scoped tokens; proper OAuth 2.1 resource-server semantics (MCP auth spec). |
| Token / credential theft | A compromised or malicious server harvests API keys, OAuth tokens, or files. | Least-privilege credentials; secret isolation/vaulting; egress control; monitor for exfiltration. |
| Rug pull | A server is benign at approval time, then silently changes its tool definition or behavior with no re-prompt. CVE-2025-54136 "MCPoison" (Check Point, Cursor IDE, CVSS 7.2) — Cursor trusted mcp.json by name, not contents; fixed in Cursor 1.3 (Jul 29 2025). | Pin and verify versions/hashes; re-approve on any change; content-based (not name-based) trust; change detection. |
| Over-broad permissions / excessive scope | Tools granted more access than the task needs — filesystem-wide, admin DB, shell. Path-traversal/sandbox-escape flaws like "EscapeRoute" (CVE-2025-53109 / -53110) in the reference Filesystem server show the blast radius. | Least privilege; scoped/read-only modes; sandboxing; allowlist of permitted operations. |
| Supply chain (unvetted community servers) | Thousands of community servers with no vetting; may carry backdoors, command injection, or malicious deps. Command-injection RCE is a documented class (e.g. CVE-2025-53107 git-mcp, CVE-2025-59834 adb-mcp, CVE-2025-53818 Kanban MCP; OX Security STDIO advisory). | Allowlist trusted servers; code review; dependency scanning; run in sandbox/container; prefer first-party/signed servers. |
| Lookalike / typosquatted servers | Malicious servers named to impersonate popular ones so they get installed. Real, widely-warned class; no single named public incident confirmed — treat generically. | Install from verified registries/publishers; verify identity/signatures; allowlisting. |
A note on the command-injection CVEs: do not read them as "these specific three servers are bad." Read them as evidence of a pattern — many MCP servers shell out with unsanitized model-controlled input, and injection characters in tool arguments reach the host shell. The fix is architectural (never pass model output to a shell; parameterized execution; sandbox), not a list of packages to avoid.
Why the agentic context makes these worse
The individual flaws above are mundane. What makes them dangerous in MCP is composition. An AI agent running multiple servers can chain a poisoned tool description into a privileged action into an exfiltration path — Invariant Labs called this "toxic flow." A single token shared across a read-only public context and a privileged private one is exactly the confused-deputy setup. And agentic autonomy means there may be no human in the loop at the moment the malicious instruction lands. The more capable and autonomous the agent — see our write-up on autonomous desktop agents — the higher the stakes of a single bad tool result.
This is also why MCP's OAuth model, while genuinely important, isn't a silver bullet. The authorization spec (introduced 2025-03-26, hardened 2025-06-18 to mandate RFC 9728 protected-resource metadata, validated audiences, and resource indicators) closes the token-passthrough and audience-confusion holes. But it's optional, HTTP-transport-only, doesn't touch local stdio servers, and does nothing against tool poisoning or indirect injection. Authorization is one layer; you still need the rest.
Governance: what to actually put in place
For a single developer experimenting locally, the bar is "don't connect servers you haven't read, and keep secrets out of them." For an MSP or a security team rolling MCP across many users and client environments, you need a program. Here's the control set we recommend, mapped to the risks above.
| Control | What it does | Counters |
|---|---|---|
| Server allowlisting | Only vetted, approved servers can be connected; arbitrary community servers are blocked. | Supply chain, lookalike servers, rug pull (limits exposure) |
| Least privilege | Scoped, read-only-where-possible credentials; one boundary per credential; no shared broad tokens. | Confused deputy, token theft, over-broad scope |
| Pin & verify | Pin server versions and tool-description hashes; diff on update; content-based trust, not name-based. | Rug pull, tool poisoning |
| Human-in-the-loop approval | Require explicit approval for sensitive/irreversible tools (writes, sends, payments, deletes); re-prompt on any change. | Prompt injection, rug pull, confused deputy |
| Treat all tool I/O as untrusted | Scan descriptions (e.g. mcp-scan); filter outputs; never auto-act on retrieved content. | Tool poisoning, indirect prompt injection |
| Sandbox / isolate | Run servers in containers/VMs with no host shell; never pass model output to a shell; parameterized execution. | Command-injection RCE, over-broad scope |
| Network egress control | Restrict where servers can call out. | Exfiltration after token/credential theft |
| Monitoring & logging | Audit tool calls, arguments, and data flows; anomaly and exfiltration detection; toxic-flow analysis. | Detects all of the above post-hoc |
| Proper OAuth | Validate token audience/resource; no passthrough; short-lived tokens (OAuth 2.1 / MCP auth spec). | Token passthrough, confused deputy |
A few practical notes on operating this:
- Allowlisting is the highest-leverage control. Most published incidents involve a server you chose to connect. Centralize that choice. Prefer first-party servers and the official MCP Registry (launched in preview Sept 8 2025) over arbitrary GitHub repos, and remember "preview" means no durability guarantees yet.
- Pin contents, not names. MCPoison is the canonical lesson: a client that trusts an approved config by name will happily run a swapped-in malicious command. Hash the tool definitions; re-approve on diff.
- Default to read-only and one credential per boundary. The confused-deputy and token-theft classes both collapse if a server can't reach the private resource with the public-context token in the first place.
- Log every tool call. When something goes wrong with an autonomous agent, the tool-call audit trail is your only reconstruction of what it did and what data crossed the boundary.
Standardizing the config across a team is its own chore — each CLI stores MCP settings differently. Our config generators for Claude Code, Codex CLI, and Gemini CLI produce consistent, reviewable configs, and our guide to adding an MCP server to any coding CLI covers the transport and field-name gotchas. For where MCP sits relative to subagents, skills, and hooks, see this breakdown.
Score a server before you connect it: run any MCP server through this checklist for a quick risk read and a copyable report you can attach to a change ticket.
MCP Server Security Checklist
Assess any MCP server before you connect it — a security scorecard covering source trust, least-privilege tool scope, secrets, OAuth/confused-deputy, version pinning, egress, and monitoring, with a live risk score and copyable report.
Open the full MCP Server Security Checklist tool →Honest framing
MCP is closer to "USB-C for AI tools" than to "a new class of vulnerability." Every risk in the catalogue has a recognizable analogue in software you already secure: untrusted input is XSS and SQL injection's lineage, the confused deputy is an OAuth scope problem, the rug pull is dependency-confusion thinking, and unvetted community servers are the npm supply-chain problem with an LLM attached. The protocol itself is evolving in the right direction — the 2025-11-25 spec is the current stable release, and the authorization model has gotten materially stricter over its revisions. (You may see references to a "2026-07-28 release candidate"; as of this writing that date is in the future and not a released spec, so don't build against it.)
What's genuinely new is the combination: an autonomous decision-maker, holding credentials, acting on attacker-influenceable content, with the reach to chain tools together. That's why the right posture is neither "MCP is dangerous, avoid it" nor "paste the config and move on." It's the same posture you'd take for any system that grants privileged automation: allowlist, least-privilege, pin, monitor, and keep a human in the loop for anything irreversible. Anchor the program to the MCP spec's own security section, OWASP's LLM and agent guidance, and the Cloud Security Alliance's agentic MCP best practices.
If you're standing up agentic AI across a team or for clients and want a second set of eyes on the trust boundaries, credential scoping, and monitoring, that's the kind of work InventiveHQ does. The goal isn't to slow MCP adoption down — it's to make it boring, governed, and auditable.