Reading model sizes — Physea Wiki

The B in a model name is the parameter count in billions, so 7B is seven billion and 70B is seventy billion. That number drives memory: at full precision each billion parameters needs about 2 GB, and quantization trims that down.

When you see a model called 7B or 70B, the number is the parameter count, and B stands for billion. So 7B is seven billion parameters and 70B is seventy billion. The Llama 2 family, for instance, is described by its authors as “ranging in scale from 7 billion to 70 billion parameters.”^[1] A bigger count generally means a more capable model, but also a slower and more demanding one.

The parameter count is what sets a model’s memory footprint, because every parameter has to be loaded to run the model. At full 16-bit precision each parameter takes 2 bytes, so “one billion parameters require 2 gigabytes.”^[2] By that math a 7B model needs roughly 14 GB just to hold its weights, and a 70B model needs around 140 GB, which is why the largest models do not fit on ordinary hardware.

A common way to shrink that footprint is quantization, which stores the parameters at lower precision. It “aims to decrease the space requirement by lowering precision of the parameters of a trained model, while preserving most of its performance.”^[2] A 7B model quantized to 4 bits can drop to roughly a quarter of its full size, which is how big models end up running on a laptop or a single consumer graphics card.

Your memory (GB) GPU VRAM, or a Mac’s unified memory Quantization lower precision = smaller, slightly less accurate

You can run models up to about —

Rule of thumb: memory needed ≈ parameters × bytes-per-weight, plus about 20% for the runtime and a modest context window. Longer context windows and bigger batches need more on top. Mixture-of-Experts models (e.g. a 235B-A22B) still need memory for all their weights, not just the active ones.

Where models are listed by size

Hugging Face Models ↗
Open hub hosting millions of models you can browse and filter, with sizes shown on each model page.
Ollama Library ↗
Catalog of models packaged to run locally, each listed by parameter size such as 8B, 70B, and 405B.

References

Llama 2: Open Foundation and Fine-Tuned Chat Models — Touvron et al., arXiv (2023)
Large language model — Wikipedia

What do 7B and 70B mean, and how much memory does each need?

Where models are listed by size

References