Watch and undo — Physea Wiki

After launch, safety comes from being able to see and undo. Log every decision and tool call, test before release through a CI/CD gate, monitor the deployed system, and keep a documented way to roll it back or deactivate it.

Log what it does. If you cannot see what the feature did, you cannot debug it, prove what happened, or learn from a near miss. OWASP’s agent cheat sheet says to “log all agent decisions, tool calls, and outcomes” and to “maintain audit trails for compliance and forensics.”^[1] A good log captures the input, the action taken, who or what approved it, and the result, so an incident can be reconstructed after the fact.

Test before release. A model that passed last month can drift, and a new prompt or tool can open a new hole. The cheat sheet calls for structured security testing before production and for “CI/CD and Release Gates” that block a release when high-risk changes ship without updated tests.^[1] The gate is the point where evaluation becomes a hard stop rather than a suggestion: if the checks do not pass, the feature does not go out.

Monitor and keep a way to undo. Watching does not end at launch. NIST’s Manage function asks for “post-deployment AI system monitoring plans,” including ways to capture and evaluate input from users.^[2] It also asks, in subcategory MANAGE 2.4, that “mechanisms are in place and applied, responsibilities are assigned and understood to supersede, disengage, or deactivate AI systems that demonstrate performance or outcomes inconsistent with intended use.”^[2] In plain terms: a working off switch with a named owner and clear triggers, plus the ability to roll back to a known-good version. Removing a system from operation should be a “standard protocol,” not a panic.^[2]

The recovery test If the feature started behaving badly right now, who turns it off, how fast, and can you put back the last version that worked? If you cannot answer, you are not ready to ship.

References

AI Agent Security Cheat Sheet — OWASP Cheat Sheet Series
AI RMF Playbook: Manage — NIST AI Resource Center

How do you keep an AI feature safe after it is live?

References