Cost and capability — Physea Wiki

Cloud APIs cost nothing upfront but charge for every request, so heavy use adds up. Local costs money for hardware first and then very little per query, but the models you can run at home are usually weaker than the best cloud models.

Cost works in opposite directions for the two options. A cloud API has no setup cost: you sign up and start sending prompts. But you pay for usage, billed by the token (a token is roughly a few characters of text). Input and output are priced separately, and output tokens cost more than input. OpenAI’s price list, for instance, shows its standard models charging several dollars per million input tokens and more per million output tokens.^[1] For light use that is cheap. For an application that runs millions of queries, the bill grows with every one.

Local cost is front-loaded. You need a machine with enough memory to hold the model (see the model size calculator), which can be a real expense. After that, each query is close to free, because you are only paying for electricity. So local tends to win once your usage is high and steady, while cloud tends to win when usage is low or unpredictable.

Capability is where cloud usually pulls ahead. The largest, strongest models are run by providers on hardware far bigger than a home computer, and those are typically the most capable models available. A model small enough to run on a laptop is real and useful, but it is usually weaker than the top cloud models on hard tasks.

Latency, the time to get a reply, cuts both ways. A cloud call has to make a round trip over the internet, which adds delay; a local model skips that, but it can still be slow if your hardware is underpowered for the model you chose. Neither side is simply faster.

References

API pricing — OpenAI

Which is cheaper and more capable, local or cloud AI?

References