Prompt, token, context

A prompt is the text you give a model. The model reads text as tokens, small chunks roughly three-quarters of a word each. The context window is the maximum number of tokens it can work with at one time.

A prompt is whatever you type in to get a response. Google defines it as “any text entered as input to a large language model to condition the model to behave in a certain way.”^[1] A prompt can be a question, an instruction, an example, or a long document you want summarized. It is the one part of the system you control directly.

A token is the unit the model actually reads in. Text “is first broken into units called tokens, which are words, character sets, or combinations of words and punctuation, by a tokenizer.”^[2] A token is often a piece of a word rather than a whole one. A rough rule of thumb is that one token is about three-quarters of a word in English, so a short paragraph might be a hundred or so tokens. Models count work in tokens, which is also why usage is usually priced per token.

The context window is how much the model can hold at once. It is “the maximum amount of text or other tokenized input available to the model at one time when generating output,” and it “is usually measured in tokens.”^[3] Think of it as the model’s working memory: the material it can see while writing a reply. This budget covers both your input and the model’s output together.^[2] Anything that falls outside the window is not available unless it is provided again, summarized, or retrieved.

References

Machine Learning Glossary: Generative AI — Google for Developers
Understanding tokens — Microsoft Learn
Context window — Wikipedia

What are a prompt, a token, and the context window?

References