How text is generated
How does the model know when to stop generating?
The generation loop has to end somewhere. A model usually stops when it finishes its turn naturally, but you can also force a stop with custom stop sequences or by capping the number of tokens.
The generation loop predicts token after token, so something has to tell it to stop. The most common ending is the model deciding it is done. As Anthropic’s API documentation puts it, models “will normally stop when they have naturally completed their turn,” which the response reports with a stop reason of end_turn.[1] During training a model learns to produce a special signal at the end of a complete reply, and the loop ends when that signal appears.
You can also force an early stop. A stop sequence is custom text you supply; if the model generates it, the loop breaks. In that case the reported stop reason is stop_sequence, and the response tells you which sequence matched.[1] This is handy when you want output to end at a known marker, such as a newline or a closing tag.
Finally there is a hard ceiling. The maximum tokens setting is “the maximum number of tokens to generate before stopping,” though the model “may stop before reaching this maximum” if it finishes on its own first.[1] The limit is a safety cap on length and cost, not a target the model tries to fill.
References
- Messages API reference — Anthropic