Prompt injection
Why can a model confuse instructions with data?
Prompt injection is the top-ranked LLM application security risk because a model receives its system prompt, the user's message, and any data it reads as one undifferentiated stream, with no protected channel for instructions separate from data.
Prompt injection is the top-ranked risk in OWASP’s list of LLM application security risks, where it is catalogued as LLM01.[1] The root cause is simple and structural: a model receives its system prompt, the user’s message, and any data it reads as one undifferentiated stream of text, and predicts from all of it at once. There is no protected channel for instructions separate from the channel for data. So text that looks like a command can be obeyed even when it was supposed to be treated as content.
This is the whole problem in a sentence. The UK’s National Cyber Security Centre puts it bluntly: research suggests a model “inherently cannot distinguish between an instruction and data provided to help complete the instruction.”[2]
The closest analogy from ordinary software is SQL injection: both arise from mixing untrusted input with trusted instructions in one channel. Simon Willison, who coined the term in September 2022, drew exactly that parallel.[3] The painful difference is that SQL injection has a clean fix, the parameterized query that keeps code and data apart. For language models there is no proven equivalent yet.
References
- LLM01:2025 Prompt Injection — OWASP Gen AI Security Project
- Exercise caution when building off LLMs — UK National Cyber Security Centre
- Prompt injection attacks against GPT-3 — Simon Willison