System prompts
Why do system prompts take priority over user messages?
Models are trained on an instruction hierarchy: system instructions outrank user messages, which outrank tool and third-party content. This keeps a session's rules in force, though it is a defense, not a guarantee.
A system prompt only works if the model honors it even when a later message says otherwise. That is the job of the instruction hierarchy. Developer instructions are ranked ahead of user messages, so the standing rules of a session stay in force as new requests arrive.[1]
This ordering was not free. Researchers found that models often treat system prompts, untrusted user input, and third-party content as the same priority, which lets an attacker overwrite the original instructions with their own.[2] To fix this, they proposed a hierarchy that defines how a model should behave when instructions of different priorities conflict, and a training method that teaches the model to ignore lower-priority instructions when they clash with higher ones.[2] The intended order runs from system messages at the top, to user messages, down to tool and other third-party content.
Two things follow. First, the order is why a system prompt can hold a persona or a safety rule in place across a long conversation. Second, it is a learned tendency, not a hard wall. The same research frames this as a defense against prompt injection that makes the model harder to attack rather than one that removes the risk.[2] Treat the hierarchy as a strong default to design around, not a guarantee that user or tool content can never override your rules.
References
- Text generation guide — OpenAI
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions — Wallace et al., OpenAI