PhyseaWiki How AI actually works Papers physea.ai →

Context & pricing

How do hosted model APIs charge you?

Hosted model APIs charge per token, counting the words you send in and the words the model writes back. Input and output are priced separately, and output is more expensive because the model has to generate it one token at a time.

Last updated 2026-06-15 · Physea Labs

Hosted model APIs do not charge per request or per question. They charge per token — the small chunks of text a model reads and writes. A token is roughly three-quarters of a word in English, so a paragraph might be 100 tokens and a long document might be tens of thousands.

The bill splits into two parts that are priced separately: the tokens you send in (input) and the tokens the model writes back (output). Rates are quoted per million tokens. On OpenAI’s pricing page, for example, GPT-5.5 lists $5.00 for input and $30.00 for output per million tokens.[1] The two numbers are almost always different, and output is the larger one.

The reason output costs more comes from how the model produces it. Input is read in a single pass, so the whole prompt can be processed at once. Output has to be generated one token at a time, because each new token depends on every token before it. Writing a 1,000-token answer means roughly 1,000 sequential steps, while reading a 1,000-token prompt is one. That gap in work is what the price difference reflects.

The practical takeaway: a long answer can cost more than a long question, even when the question has far more words. If you want to control spend, the levers are the size of what you send, the length of what you ask for, and the features that cut the cost of repeated context.

References

  1. Pricing — OpenAI