The calculator — Physea Wiki

Enter how much memory you have and pick a precision level. The calculator estimates the largest model you can run and shows which common sizes (7B, 32B, 70B, and up) fit, using the rule that memory needed is roughly parameters times bytes-per-weight plus overhead.

The single number that decides what you can run locally is memory: your GPU’s VRAM, or, on a Mac, the unified memory shared between CPU and GPU. A model has to fit in memory to run well. If it does not fit, it either refuses to load or slows to a crawl.

Enter your memory below and pick a quantization level to see what fits.

Your memory (GB) GPU VRAM, or a Mac’s unified memory Quantization lower precision = smaller, slightly less accurate

You can run models up to about —

Rule of thumb: memory needed ≈ parameters × bytes-per-weight, plus about 20% for the runtime and a modest context window. Longer context windows and bigger batches need more on top. Mixture-of-Experts models (e.g. a 235B-A22B) still need memory for all their weights, not just the active ones.

A few things worth knowing as you read the result. Quantization (covered in its own topic) is what makes large models practical at home: 4-bit weights are about a quarter the size of full precision for a small quality cost. Context is the asterisk: a long context window needs extra memory on top of the weights, so leave headroom if you plan to feed in long documents. And for a Mixture-of-Experts model, the size shown is the total weight count it must hold, not the smaller number it uses per token.

How big a model can my computer run?