PhyseaWiki How AI actually works Papers physea.ai →

Safe deployment

How do you limit what an AI feature is allowed to do?

Limit reach in three ways: grant the least access the task needs (least privilege), require a human to approve any irreversible action, and filter inputs and outputs because both can carry hidden instructions or leaked data.

Last updated 2026-06-15 · Physea Labs

Least privilege means the feature can touch only what its job requires, and nothing more. OWASP’s Excessive Agency risk is exactly the failure of ignoring this: it warns to “limit the extensions that LLM agents are allowed to call to only the minimum necessary” and to run actions “with user-specific credentials, not privileged shared accounts.”[1] The agent cheat sheet puts it as granting “the minimum tools required for their specific task” with “per-tool permission scoping (read-only vs. write, specific resources).”[3] A read-only feature should hold read-only keys. If it never needs to delete, it should not be able to.

Human approval for irreversible actions is the second control. OWASP advises using “human-in-the-loop control to require a human to approve high-impact actions before they are taken,” with sending an email as the example.[1] The cheat sheet adds a useful separation: “the agent can propose an action, but a policy service or execution component should independently validate scope, privilege, and approval state.”[3] The pattern is recommend, then approve, then execute. Drafting is safe; sending, paying, and deleting are the steps that wait for a person.

Filter what goes in and out. Treat every incoming message, retrieved document, and API response as untrusted, because any of them can carry hidden instructions aimed at the model.[3] OWASP’s prompt-injection guidance recommends applying “semantic filters” and string checks to scan for non-allowed content, validating that outputs match an expected format, and clearly separating untrusted content so it has less influence.[2] Output filtering also catches sensitive data on the way back out before a user or a downstream system sees it.

Source standards

  • OWASP LLM06: Excessive Agency

    The risk of giving an LLM too much permission or autonomy, with mitigations: minimize tools, scope credentials, and require human approval for high-impact actions.

  • OWASP AI Agent Security Cheat Sheet

    A practical pre-production checklist for agents: least privilege, human-in-the-loop, untrusted inputs, output validation, and logging.

References

  1. LLM06:2025 Excessive Agency — OWASP Gen AI Security Project
  2. LLM01:2025 Prompt Injection — OWASP Gen AI Security Project
  3. AI Agent Security Cheat Sheet — OWASP Cheat Sheet Series