Skip to main content
Home/Blog/MCP Security Risks: A Practical Threat Model for Teams Connecting AI Agents to Tools
Artificial Intelligence

MCP Security Risks: A Practical Threat Model for Teams Connecting AI Agents to Tools

MCP isn't uniquely unsafe, but every server you connect widens your attack surface. A risk catalogue, the trust model you're actually accepting, and the governance controls MSPs and security teams should put in place.

By InventiveHQ Team

Connecting an MCP server to an AI agent feels as casual as installing a browser extension: paste a config block, restart the client, done. That casualness is the problem. When you connect a Model Context Protocol server, you are not adding a feature — you are granting an autonomous agent a new set of capabilities and, often, a new set of credentials, and you are extending your trust to whoever wrote and operates that server.

This is not a fear piece. MCP is not uniquely unsafe. It is an open standard built by Anthropic in late 2024, now governed under the Linux Foundation's Agentic AI Foundation, with a published security best-practices section and a real OAuth 2.1 authorization model. Most of the documented MCP incidents are not protocol flaws at all — they are old application-security problems (untrusted input, over-broad credentials, unvetted code) wearing a new coat. But the agentic context amplifies them, because the thing acting on the malicious input is an LLM with tools, a credential wallet, and the autonomy to chain actions together. This post is the threat model we use at InventiveHQ when a client wants to roll MCP out across a team.

The trust model: you are trusting every server you connect

Start here, because every risk below is a corollary. In MCP's architecture there are three roles: a host (the LLM application — an IDE, a chat app, an agent), a client (a connector inside the host holding a 1:1 stateful connection to one server), and a server (the thing exposing capabilities). The wire format is JSON-RPC 2.0, and a server advertises three primitives to the model: tools (functions the model can invoke), resources (data the model can read), and prompts (templated workflows).

The security-relevant fact is what a server can put in front of the model. Tool descriptions are read by the model to decide what to call. Tool results are fed back into the model's context as it reasons. Both are attacker-controllable if the server is malicious or compromised. So the trust boundary isn't "did I type the config correctly" — it's "do I trust this server's code, its operators, its dependencies, and everything it will ever return, for as long as it stays connected." If you wouldn't give a stranger's binary a shell on your laptop and your OAuth tokens, think twice before connecting their MCP server.

It helps to remember that MCP doesn't replace function calling — it feeds it. The model still emits a structured call against a JSON Schema; MCP just standardizes how that tool was discovered and connected. Every classic tool-use risk still applies, plus the new discovery and connection surface MCP adds on top.

MCP trust boundary: host and client trust everything a server returns Your trust zone Host (LLM app) Client + your tokens Trust boundary crossed MCP Server code · operator · deps tool descriptions (untrusted) tool results (untrusted) JSON-RPC 2.0 Everything right of the boundary is attacker-influenceable if the server is malicious or compromised.

The risk catalogue

Each row below is a distinct risk class. Where there's a publicly verified incident or CVE we name it; where the class is real but we couldn't confirm a specific named incident, we describe it generically and say so.

RiskWhat it isMitigation
Tool poisoningMalicious or hidden instructions embedded in a tool's description or schema fields — visible to the model, usually not to the user. The model silently obeys (exfiltrate files, leak secrets) while returning normal output. Disclosed by Invariant Labs, April 2025.Treat descriptions as untrusted; scan with mcp-scan; pin and diff descriptions; render full tool metadata to users.
Prompt injection via tool resultsUntrusted data a tool returns (a GitHub issue, web page, file, email) carries instructions that hijack the agent — indirect prompt injection.Treat all tool output as untrusted data; provenance/tainting; output filtering; don't auto-act on retrieved content; human approval for high-impact actions.
Confused deputyThe server holds broad ambient authority (one OAuth token) and is tricked into using it for the attacker — e.g. jumping from a public repo to a private one on the same token. Demonstrated against the GitHub MCP server (Invariant Labs, May 26 2025).Per-resource scoping; token audience/resource binding; separate credentials per boundary; deny cross-context access.
OAuth token passthroughA server accepts or forwards tokens not issued for it, or stores long-lived tokens, enabling replay and over-reach.Validate token audience; never pass tokens through; short-lived/scoped tokens; proper OAuth 2.1 resource-server semantics (MCP auth spec).
Token / credential theftA compromised or malicious server harvests API keys, OAuth tokens, or files.Least-privilege credentials; secret isolation/vaulting; egress control; monitor for exfiltration.
Rug pullA server is benign at approval time, then silently changes its tool definition or behavior with no re-prompt. CVE-2025-54136 "MCPoison" (Check Point, Cursor IDE, CVSS 7.2) — Cursor trusted mcp.json by name, not contents; fixed in Cursor 1.3 (Jul 29 2025).Pin and verify versions/hashes; re-approve on any change; content-based (not name-based) trust; change detection.
Over-broad permissions / excessive scopeTools granted more access than the task needs — filesystem-wide, admin DB, shell. Path-traversal/sandbox-escape flaws like "EscapeRoute" (CVE-2025-53109 / -53110) in the reference Filesystem server show the blast radius.Least privilege; scoped/read-only modes; sandboxing; allowlist of permitted operations.
Supply chain (unvetted community servers)Thousands of community servers with no vetting; may carry backdoors, command injection, or malicious deps. Command-injection RCE is a documented class (e.g. CVE-2025-53107 git-mcp, CVE-2025-59834 adb-mcp, CVE-2025-53818 Kanban MCP; OX Security STDIO advisory).Allowlist trusted servers; code review; dependency scanning; run in sandbox/container; prefer first-party/signed servers.
Lookalike / typosquatted serversMalicious servers named to impersonate popular ones so they get installed. Real, widely-warned class; no single named public incident confirmed — treat generically.Install from verified registries/publishers; verify identity/signatures; allowlisting.

A note on the command-injection CVEs: do not read them as "these specific three servers are bad." Read them as evidence of a pattern — many MCP servers shell out with unsanitized model-controlled input, and injection characters in tool arguments reach the host shell. The fix is architectural (never pass model output to a shell; parameterized execution; sandbox), not a list of packages to avoid.

Why the agentic context makes these worse

The individual flaws above are mundane. What makes them dangerous in MCP is composition. An AI agent running multiple servers can chain a poisoned tool description into a privileged action into an exfiltration path — Invariant Labs called this "toxic flow." A single token shared across a read-only public context and a privileged private one is exactly the confused-deputy setup. And agentic autonomy means there may be no human in the loop at the moment the malicious instruction lands. The more capable and autonomous the agent — see our write-up on autonomous desktop agents — the higher the stakes of a single bad tool result.

This is also why MCP's OAuth model, while genuinely important, isn't a silver bullet. The authorization spec (introduced 2025-03-26, hardened 2025-06-18 to mandate RFC 9728 protected-resource metadata, validated audiences, and resource indicators) closes the token-passthrough and audience-confusion holes. But it's optional, HTTP-transport-only, doesn't touch local stdio servers, and does nothing against tool poisoning or indirect injection. Authorization is one layer; you still need the rest.

Governance: what to actually put in place

For a single developer experimenting locally, the bar is "don't connect servers you haven't read, and keep secrets out of them." For an MSP or a security team rolling MCP across many users and client environments, you need a program. Here's the control set we recommend, mapped to the risks above.

ControlWhat it doesCounters
Server allowlistingOnly vetted, approved servers can be connected; arbitrary community servers are blocked.Supply chain, lookalike servers, rug pull (limits exposure)
Least privilegeScoped, read-only-where-possible credentials; one boundary per credential; no shared broad tokens.Confused deputy, token theft, over-broad scope
Pin & verifyPin server versions and tool-description hashes; diff on update; content-based trust, not name-based.Rug pull, tool poisoning
Human-in-the-loop approvalRequire explicit approval for sensitive/irreversible tools (writes, sends, payments, deletes); re-prompt on any change.Prompt injection, rug pull, confused deputy
Treat all tool I/O as untrustedScan descriptions (e.g. mcp-scan); filter outputs; never auto-act on retrieved content.Tool poisoning, indirect prompt injection
Sandbox / isolateRun servers in containers/VMs with no host shell; never pass model output to a shell; parameterized execution.Command-injection RCE, over-broad scope
Network egress controlRestrict where servers can call out.Exfiltration after token/credential theft
Monitoring & loggingAudit tool calls, arguments, and data flows; anomaly and exfiltration detection; toxic-flow analysis.Detects all of the above post-hoc
Proper OAuthValidate token audience/resource; no passthrough; short-lived tokens (OAuth 2.1 / MCP auth spec).Token passthrough, confused deputy

A few practical notes on operating this:

  • Allowlisting is the highest-leverage control. Most published incidents involve a server you chose to connect. Centralize that choice. Prefer first-party servers and the official MCP Registry (launched in preview Sept 8 2025) over arbitrary GitHub repos, and remember "preview" means no durability guarantees yet.
  • Pin contents, not names. MCPoison is the canonical lesson: a client that trusts an approved config by name will happily run a swapped-in malicious command. Hash the tool definitions; re-approve on diff.
  • Default to read-only and one credential per boundary. The confused-deputy and token-theft classes both collapse if a server can't reach the private resource with the public-context token in the first place.
  • Log every tool call. When something goes wrong with an autonomous agent, the tool-call audit trail is your only reconstruction of what it did and what data crossed the boundary.

Standardizing the config across a team is its own chore — each CLI stores MCP settings differently. Our config generators for Claude Code, Codex CLI, and Gemini CLI produce consistent, reviewable configs, and our guide to adding an MCP server to any coding CLI covers the transport and field-name gotchas. For where MCP sits relative to subagents, skills, and hooks, see this breakdown.

Score a server before you connect it: run any MCP server through this checklist for a quick risk read and a copyable report you can attach to a change ticket.

MCP Server Security Checklist

Assess any MCP server before you connect it — a security scorecard covering source trust, least-privilege tool scope, secrets, OAuth/confused-deputy, version pinning, egress, and monitoring, with a live risk score and copyable report.

Open the full MCP Server Security Checklist tool →
Loading interactive tool...

Honest framing

MCP is closer to "USB-C for AI tools" than to "a new class of vulnerability." Every risk in the catalogue has a recognizable analogue in software you already secure: untrusted input is XSS and SQL injection's lineage, the confused deputy is an OAuth scope problem, the rug pull is dependency-confusion thinking, and unvetted community servers are the npm supply-chain problem with an LLM attached. The protocol itself is evolving in the right direction — the 2025-11-25 spec is the current stable release, and the authorization model has gotten materially stricter over its revisions. (You may see references to a "2026-07-28 release candidate"; as of this writing that date is in the future and not a released spec, so don't build against it.)

What's genuinely new is the combination: an autonomous decision-maker, holding credentials, acting on attacker-influenceable content, with the reach to chain tools together. That's why the right posture is neither "MCP is dangerous, avoid it" nor "paste the config and move on." It's the same posture you'd take for any system that grants privileged automation: allowlist, least-privilege, pin, monitor, and keep a human in the loop for anything irreversible. Anchor the program to the MCP spec's own security section, OWASP's LLM and agent guidance, and the Cloud Security Alliance's agentic MCP best practices.

If you're standing up agentic AI across a team or for clients and want a second set of eyes on the trust boundaries, credential scoping, and monitoring, that's the kind of work InventiveHQ does. The goal isn't to slow MCP adoption down — it's to make it boring, governed, and auditable.

Frequently Asked Questions

Find answers to common questions

No. The Model Context Protocol is a transport-and-protocol standard (JSON-RPC 2.0) with a security best-practices section and an OAuth 2.1 / RFC 9728 authorization model. The risk isn't the protocol itself — it's that connecting an MCP server grants an AI agent the server's capabilities, and most of the documented incidents are application-level: untrusted tool descriptions, untrusted tool output, over-broad credentials, and unvetted community servers. MCP widens the attack surface; it doesn't create a new category of unfixable flaw.

Tool poisoning is when a malicious or hidden instruction is embedded in a tool's description or schema fields — text the model reads but the user usually doesn't. The model silently obeys it (for example, exfiltrating files or secrets) while returning normal-looking output. It was disclosed by Invariant Labs in April 2025, which also released the mcp-scan tool. Mitigate by treating tool descriptions as untrusted, scanning and diffing them, and rendering full tool metadata to users.

A confused-deputy attack happens when an MCP server holds broad ambient authority — typically one OAuth token — and is tricked into using it for an attacker's benefit. Invariant Labs demonstrated this with the GitHub MCP server in May 2025: a malicious public issue hijacked an agent into pulling private-repo data and leaking it through an auto-created public pull request, all under one token. Mitigate with per-resource scoping, token audience binding, and separate credentials per sensitive boundary.

A rug pull is when a server is benign at the moment you approve it, then silently changes its tool definition or behavior afterward without re-prompting. CVE-2025-54136 ('MCPoison', Check Point Research, CVSS 7.2) showed Cursor trusting an approved mcp.json by name rather than contents, letting a collaborator swap an approved tool for a malicious command later — silent RCE. Cursor 1.3 (July 29, 2025) fixed it by re-prompting on any config change. Mitigate by pinning versions, hashing tool descriptions, and re-approving on change.

Treat MCP servers like any other third-party software with privileged access. Allowlist only vetted servers, grant least-privilege and read-only-where-possible credentials with one boundary per credential, pin and verify versions, require human approval for sensitive or irreversible tools, sandbox servers in containers with no host shell, control network egress, and log every tool call for anomaly and exfiltration detection. Align the program with the MCP spec's security section, OWASP's LLM/agent guidance, and the Cloud Security Alliance's MCP best practices.

Treat them as untrusted by default. There are thousands of community servers with no central vetting, and documented CVEs include command-injection RCE across many of them. Lookalike and typosquatted servers are a real, widely-warned risk class. Prefer first-party or signed servers, install from verified registries and publishers, review code and scan dependencies, and run anything you haven't audited in a sandbox or container.

It helps but doesn't fix everything. The MCP authorization model makes the server an OAuth 2.1 resource server and, since the 2025-06-18 revision, mandates RFC 9728 protected-resource metadata, validated token audiences, and resource indicators — closing the token-passthrough hole. But OAuth is optional and HTTP-transport-only, doesn't apply to local stdio servers, and does nothing against tool poisoning or indirect prompt injection. It's one necessary control among several.

Let's turn this knowledge into action

Our experts can help you apply these insights to your specific situation. No sales pitch — just a technical conversation.