Serving & runtimes
What are the easiest ways to run a model on my own machine?
Ollama runs models from the command line, pulling them by name like Docker images. LM Studio is a point-and-click desktop app with a built-in model browser and chat. Both give you a local OpenAI-compatible server.
If you want to get a model running today, these two tools ask the least of you.
Ollama is the command-line option. You pull a model by name and run it, much the way you would pull a Docker image. The project’s own tagline lists models you can start with, such as Qwen, Gemma, DeepSeek, and Llama.[1] It is open source under the MIT license, gives you a local REST API, and offers OpenAI-compatible chat completions so other apps can talk to it.[2] Note one current limit: Ollama’s OpenAI-compatible layer covers chat completions, and an embeddings API was listed as still under consideration rather than available.[2]
LM Studio is the point-and-click option. It is a desktop application for running models locally, with a built-in browser for finding and downloading them and a chat window for talking to them.[3] When you are ready to wire it into your own code, it can serve OpenAI-like endpoints on your machine and across your network.[3] It also offers a headless mode for servers or machines where you do not want a GUI.[3]
A useful thing to know: LM Studio runs llama.cpp under the hood for GGUF models, and adds Apple’s MLX engine on Apple Silicon.[3] So the friendly window you click in is sitting on top of the same engine covered on the next page.
Friendly local runtimes
- Ollama ↗
Command-line tool that pulls and runs models by name; local REST and OpenAI-compatible API.
- LM Studio ↗
Desktop app with a model browser, chat window, and a local OpenAI-compatible server.
References
- Ollama README — Ollama
- OpenAI compatibility — Ollama
- LM Studio Documentation — LM Studio