Prompt injection
What is the difference between direct and indirect prompt injection?
Direct injection is the user's own input overriding the system's intent. Indirect injection, the more dangerous form, hides instructions inside data the model will later read, turning any untrusted document into a remote control for an agent.
Direct injection is when the user’s own input overrides the system’s intent: the classic “ignore your previous instructions and do X.” In February 2023 a student used exactly this against Bing Chat and made it disclose its confidential system prompt and internal codename, “Sydney.”[1]
Indirect injection is the more dangerous form. The attacker hides instructions inside data the model will later read, a web page, a document, an email, and never speaks to the victim at all. Greshake and colleagues formalized this in 2023 and demonstrated working attacks against a production GPT-4 chat system.[2] It is more dangerous because it turns any untrusted document into a remote control for an agent that has tools and data access (see tool use and MCP).
In 2025 this stopped being theoretical. “EchoLeak” (CVE-2025-32711) was a zero-click attack on Microsoft 365 Copilot: a malicious instruction buried in an email rode through Copilot’s retrieval pipeline and silently exfiltrated private data, with no user click required.[3]
Injection is not jailbreaking Jailbreaking bypasses a model’s safety training to get banned content. Prompt injection hijacks the application’s task by confusing instructions with data. They get lumped together but they are different attacks.
References
- Bing Chatbot Exposes Confidential Instructions After Prompt Injection (Kevin Liu / 'Sydney') — OECD.AI Incidents
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al., arXiv (AISec 2023)
- Zero-Click AI Vulnerability Exposes Microsoft 365 Copilot Data (EchoLeak / CVE-2025-32711) — The Hacker News