Local vs cloud
What is the difference between running AI locally and using a cloud API?
Running a model locally means it lives on your computer and all the work happens there. Using a cloud API means your prompt travels over the internet to a provider that runs the model and sends the answer back.
There are two ways to get an answer out of an AI model, and they differ in one simple way: where the model actually runs.
Running locally means the model file sits on your own computer and all the computation happens there, on your CPU or GPU. You download the model once, and from then on every prompt is processed on your hardware. Nothing is sent over the network to do the work. Tools like Ollama install on macOS, Windows, and Linux and handle the download-and-run steps for you.[1]
Using a cloud API means you never hold the model at all. Your prompt is sent over the internet to a provider’s servers, their hardware runs the model, and the answer is sent back to you. You are renting time on someone else’s machine, usually paying for each request.
That single difference, where the work happens, is what drives everything else: who can see your data, what it costs, how capable the model is, whether it works without internet, and how fast the reply comes. The pages in this topic walk through each of those tradeoffs, and then the common pattern most people land on, which uses both.
Where to run a model
- Ollama ↗
A free tool that downloads and runs open models on your own macOS, Windows, or Linux machine.
- OpenAI API ↗
A hosted API: you send a prompt over the internet and pay per request.
- Anthropic (Claude) ↗
A hosted API for the Claude models, billed per token of input and output.
References
- Download Ollama — Ollama