Cost and context — Physea Wiki

You are billed per token and the context window is measured in tokens, so token count drives both cost and how much text fits. Token counts also differ between models, so the same text can cost more on a different tokenizer.

Tokens are the unit you pay for. Model pricing is quoted per million tokens, and input and output are priced separately. Anthropic’s published rates, for example, charge a lower price for the tokens you send in than for the tokens the model writes back.^[1] A handy estimate from the same pricing page: “1 token is approximately 4 characters or 0.75 words in English.”^[1]

Tokens are also the unit of a model’s working memory. The context window “refers to all the text a language model can reference when generating a response, including the response itself,”^[2] and it is measured in tokens. Everything has to fit: your prompt, any documents, the conversation so far, and the reply. Run out of room and older content has to be dropped or summarized. So token count, not word count, sets both your bill and your limits.

One catch worth knowing: token counts are not universal. Different models use different tokenizers, so the same text can come out to a different number of tokens. Anthropic notes that its newer models “use a new tokenizer” that “may use up to 35% more tokens for the same fixed text.”^[1] When budgeting, count tokens with the tokenizer for the model you are actually using.

Counting tokens

tiktoken ↗
OpenAI's open BPE tokenizer; counts tokens for OpenAI models so you can estimate length and cost.
Anthropic token counting ↗
API that returns the token count for a request before you send it to Claude.

References

Pricing — Anthropic
Context windows — Anthropic
tiktoken README — OpenAI

Why do tokens matter for cost and context?

Counting tokens

References