Capability, cost, privacy

Most model choices balance three pulls: capability, cost, and privacy. The strongest model is not always the right one. A smaller or self-hosted model can be the better fit when speed, budget, or data control matter more.

Selecting a model means “balancing three key considerations: capabilities, speed, and cost.”^[1] These pull against each other. The most capable model is usually the slowest and most expensive, and the cheapest is rarely the smartest. The skill is matching the model to the task rather than always reaching for the top of the leaderboard.

A useful habit is to start cheap and only move up if you have to. Anthropic suggests one path of beginning with a fast, low-cost model, testing it, and upgrading “only if necessary for specific capability gaps.”^[1] Many real tasks are simple enough that a smaller model handles them at a fraction of the price.

Privacy is the third pull, and it can override the other two. If your data cannot leave your own systems, a model you run yourself becomes the answer regardless of cost. A self-hosting guide makes the case plainly: a modest local setup “gets you fair quality for <$1/h, no data leaks.”^[2] The trade is real work to operate it. Sending data to a hosted API is simpler and often more capable, but it means your data passes through someone else’s servers.

Where models come from

Anthropic Claude ↗
Hosted API with a clear model-selection guide and a capability/speed/cost framing.
Hugging Face ↗
Open hub for open-weight models you can download and run yourself.
Ollama ↗
Free tool for running open models locally on your own machine.

References

Choosing the right model — Anthropic
The 2025 Self-Hosting Field Guide to Open LLMs — Freeport Metrics

How do you weigh capability against cost and privacy?

Where models come from

References