PhyseaWiki How AI actually works Papers physea.ai →

Subject 06 · Cross-cutting; matters most once agents can act

Safety & Security

Prompt injection, jailbreaks, alignment, and data privacy. Where tool use turns a wrong answer into a wrong action.

23 pages across 6 topics

Prompt injection

The top LLM security risk.

One channel Prompt injection is the top-ranked LLM application security risk because a model receives its system prompt, the user's message, and any data it reads as one undifferentiated stream, with no protected channel for instructions separate from data.
Two forms Direct injection is the user's own input overriding the system's intent. Indirect injection, the more dangerous form, hides instructions inside data the model will later read, turning any untrusted document into a remote control for an agent.
Why unsolved Prompt injection is effectively unsolved. OWASP says there may be no fool-proof method of prevention, and the NCSC warns it may be an inherent issue with LLM technology, because a probabilistic model can be reworded around any filter.
How to contain Since you cannot assume prevention, the goal is to make a successful injection low-impact through layered controls like least privilege and human approval, plus by-design defenses such as Google DeepMind's CaMeL that fix the channel rather than the model's behavior.

Jailbreaks

Bypassing a model’s safety training.

Alignment basics

Getting models to do what we intend.

Data privacy

What leaves your machine, and what does not.

Evaluating trust

How to tell if output can be relied on.

Safe deployment

A checklist before you ship.