PhyseaWiki How AI actually works Papers physea.ai →

Subject 07 · Builds on Architecture + Models

Running AI Yourself

The practical track: local inference, hardware and VRAM, quantization, serving runtimes, and the local-versus-cloud trade-off.

22 pages across 7 topics

Local inference basics

The two-layer local stack.

Hardware & VRAM

What your machine can hold.

Model-size calculator

Find the biggest model your hardware can run.

Quantization

Shrinking a model to fit.

Serving & runtimes

Ollama, llama.cpp, vLLM, and friends.

Running on a Mac

Unified memory and Apple silicon.

Local vs cloud

When to run it yourself.