PhyseaWiki How AI actually works Papers physea.ai →

Prompt injection

How do you contain prompt injection if you cannot prevent it?

Since you cannot assume prevention, the goal is to make a successful injection low-impact through layered controls like least privilege and human approval, plus by-design defenses such as Google DeepMind's CaMeL that fix the channel rather than the model's behavior.

Last updated 2026-06-15 · Physea Labs

Since you cannot assume prevention, the goal is to make a successful injection low-impact. OWASP’s mitigations are layered: constrain behavior with a tight system prompt, validate output formats, filter inputs and outputs, enforce least privilege so a hijacked agent can do little, require human approval for high-risk actions, clearly segregate untrusted content, and red-team adversarially.[1] These are the same controls covered on the rules and guardrails page.

A newer direction tries to fix the channel by design rather than asking the model to behave. Google DeepMind’s CaMeL splits a privileged planner model, which sees only the trusted request, from a quarantined model that reads untrusted data but cannot call tools, and tracks data provenance to enforce what tainted data may do. It reports solving 77% of a benchmark’s tasks with provable security, while its own authors note it is not a complete fix.[2]

Standards & defenses

  • OWASP Gen AI Security Project

    The community standard that defines prompt injection (LLM01), the direct/indirect split, and the mitigation checklist.

  • Google DeepMind CaMeL

    A by-design defense: a privileged planner plus a quarantined data-reader with capability-tracked data flow. Research, not a product.

References

  1. LLM01:2025 Prompt Injection — OWASP Gen AI Security Project
  2. Defeating Prompt Injections by Design (CaMeL) — Debenedetti et al., Google DeepMind, arXiv