AIStrategyWhitepaper

Threat Modeling Agentic Workflows

Dephiant ResearchMay 27, 20184 min read

A practical STRIDE-style threat model tailored for multi-step AI agents with tool access.

The burgeoning field of artificial intelligence, particularly the development and deployment of multi-step AI agents with tool access, introduces a novel and complex set of cybersecurity challenges. As these agents gain the capability to autonomously execute tasks by leveraging external tools and maintaining internal states, their potential attack surface expands significantly. This expansion occurs primarily in three critical directions: the tools an agent can access and utilize, the memory it maintains for statefulness and context, and the processing of untrusted inputs from external sources. To effectively mitigate these emerging risks, a robust and practical threat modeling framework is essential. This article proposes a STRIDE-style approach specifically tailored to address the unique vulnerabilities inherent in agentic workflows.

Expanding Attack Surfaces in Agentic Design

Understanding the expanded attack surface is foundational to developing effective defenses for AI agents. Each vector presents distinct opportunities for malicious actors to compromise the agent's integrity, confidentiality, or availability.

Tools: Agents frequently interact with external applications, APIs, or system functions to achieve their objectives. These tools, which can range from simple data retrieval mechanisms to complex system control utilities, become potential conduits for attacks if not properly secured. An agent's access to a tool effectively extends the attack surface to include the vulnerabilities of that tool itself, as well as the communication channels used to interact with it.
Memory: Agentic workflows often rely on internal memory mechanisms to store conversational history, contextual information, intermediate results, and even personal data. This memory is crucial for maintaining coherence and achieving multi-step goals. However, it also becomes a prime target for manipulation, unauthorized access, or data exfiltration if an attacker can compromise the agent's internal state.
Untrusted Inputs: A core function of many agents is to process and act upon user-provided or external inputs. These inputs, by definition, must be considered untrusted and are a classic vector for various attacks, including injection vulnerabilities, prompt engineering exploits, and data poisoning, all of which can influence the agent's behavior and decisions.

Given these expanded attack surfaces, a systematic approach to identifying and categorizing potential threats is paramount. The STRIDE threat model, originally developed by Microsoft, provides a potent framework for this purpose. When adapted for agentic systems, STRIDE can help security professionals systematically uncover weaknesses before they are exploited.

Applying STRIDE Per Agent Capability

Each component of the STRIDE model, Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege, can be directly applied to analyze the specific capabilities and interactions within an agentic workflow. By considering these threat categories against an agent's operational aspects, we can identify concrete attack scenarios and formulate targeted mitigations.

Spoofing: Can untrusted content impersonate the system prompt or another trusted agent? This considers scenarios where malicious input or cleverly crafted data could trick the agent into believing it is receiving instructions from a legitimate source, such as the system administrator or an authorized user, thereby circumventing established security policies. An example would be an attacker injecting malicious instructions disguised as a system-level command.
Tampering: Can an agent modify its own memory or another agent's memory? This threat focuses on unauthorized alteration of data. An attacker might attempt to manipulate an agent's internal state, such as its learned preferences, stored credentials, or operational parameters, to steer its behavior in an undesirable direction. This also extends to one compromised agent altering the memory or operational data of another agent within a multi-agent system.
Repudiation: Are tool calls traceable to a specific request and an authorized user? The ability to accurately audit and attribute actions is crucial for accountability and post-incident investigation. If an agent performs an action via a tool, such as deleting a file or making an API call, it must be possible to definitively link that action back to the initiating user or trigger event, preventing a malicious actor or a compromised agent from denying involvement.
Information Disclosure: Does the agent leak secrets in its scratchpad or intermediate reasoning steps? Agents often maintain a "scratchpad" or internal log of their thought process, tool outputs, and temporary data. This ephemeral storage can inadvertently become a vector for information disclosure if sensitive data, such as API keys, personal identifiable information (PII), or confidential business logic, is exposed within this accessible internal state.
Denial of Service: Can a poisoned input drive infinite tool loops or consume excessive resources? An attacker could craft an input designed to force the agent into a computationally intensive or infinite loop, for example, repeatedly calling an expensive external tool or processing an infinitely recursive data structure. This can saturate resources, making the agent or dependent systems unavailable to legitimate users.
Elevation of Privilege: Can a low-trust agent invoke a high-trust tool or sensitive functionality? This threat concerns the bypass of access controls. A less privileged agent, perhaps one designed for public interaction, might through a vulnerability or clever prompting be able to access tools or perform actions reserved for agents with higher security clearances, potentially leading to unauthorized data access or system control.

By systematically addressing each of these STRIDE categories in the context of an agent's capabilities and interactions, organizations can develop a comprehensive understanding of their agentic systems' vulnerabilities. This structured approach facilitates the proactive design of security controls, architectural decisions, and operational monitoring strategies to build more resilient and trustworthy AI agents. Implementing this threat modeling methodology early in the development lifecycle is critical for anticipating and mitigating risks in the evolving landscape of AI-powered automation.

← Back to all insights

Talk to our team