Local inference basics
Which one should I install first?
Pick by comfort level. Want a chat window with no setup? Start with a desktop app like LM Studio or Jan. Comfortable in a terminal and want control? Start with Ollama or llama.cpp.
You do not need to understand the whole stack to begin. Pick one tool by how you like to work, install it, and let it bundle whatever runtime it needs.
If you want the easiest path, start with a desktop app. LM Studio and Jan both give you a chat window and a built-in place to search for and download models, with nothing to set up by hand.[3, 4] Open the app, browse to a small model that fits your memory, download it, and start chatting. This is the right first step for most people.
If you are comfortable at a command line and want more control, start with a runtime directly. Ollama is the gentler of the two: install it, then run a model with one short command.[1] llama.cpp gives you the most options and the widest hardware support, at the cost of more setup.[2]
One sizing note before you download anything: a model has to fit in your memory to run well, so check the model size calculator and pick a smaller model for your first try. You can always step up later.
Where to get them
- Ollama ↗
Runtime and model manager, driven from the terminal
- llama.cpp ↗
The low-level inference engine; most control, most setup
- LM Studio ↗
Desktop app with chat and a model browser; free for home and work
- Jan ↗
Open source desktop app that runs fully offline