How to contain — Physea Wiki

Since you cannot assume prevention, the goal is to make a successful injection low-impact through layered controls like least privilege and human approval, plus by-design defenses such as Google DeepMind's CaMeL that fix the channel rather than the model's behavior.

Since you cannot assume prevention, the goal is to make a successful injection low-impact. OWASP’s mitigations are layered: constrain behavior with a tight system prompt, validate output formats, filter inputs and outputs, enforce least privilege so a hijacked agent can do little, require human approval for high-risk actions, clearly segregate untrusted content, and red-team adversarially.^[1] These are the same controls covered on the rules and guardrails page.

A newer direction tries to fix the channel by design rather than asking the model to behave. Google DeepMind’s CaMeL splits a privileged planner model, which sees only the trusted request, from a quarantined model that reads untrusted data but cannot call tools, and tracks data provenance to enforce what tainted data may do. It reports solving 77% of a benchmark’s tasks with provable security, while its own authors note it is not a complete fix.^[2]

Standards & defenses

OWASP Gen AI Security Project ↗
The community standard that defines prompt injection (LLM01), the direct/indirect split, and the mitigation checklist.
Google DeepMind CaMeL ↗
A by-design defense: a privileged planner plus a quarantined data-reader with capability-tracked data flow. Research, not a product.

References

LLM01:2025 Prompt Injection — OWASP Gen AI Security Project
Defeating Prompt Injections by Design (CaMeL) — Debenedetti et al., Google DeepMind, arXiv

How do you contain prompt injection if you cannot prevent it?

Standards & defenses

References