Layers and depth — Physea Wiki

A language model is a stack of repeated layers. Each layer takes the running representation of the text and reworks it a little, and the count of layers is what people mean by depth.

A language model does not process text in one step. It passes the text through a stack of layers, one after another. Each layer takes the running internal representation of the text, reworks it, and hands the result to the next layer. The parameters live inside these layers, so the layers are where the actual work happens.

The number of layers in the stack is the model’s depth. Bigger models tend to be deeper. For example, the LLaMA 2 family at 7B, 13B, and 70B parameters “consist of 32, 40 and 80 transformer layers.”^[1] So the 70B model is not just wider, it also runs the text through more than twice as many processing steps as the 7B.

Why stack so many? The rough intuition is that early layers tend to handle surface details and later layers build toward more abstract meaning, with each layer working from the output of the one before it. More layers give the model more chances to refine its understanding before it produces an answer. Depth is one of the main knobs a model’s designers turn, alongside how wide each layer is.

In short Layers are the processing steps a model runs in sequence. Depth is how many of them there are, and deeper usually means more parameters and more steps of refinement.

References

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers — Chen et al., arXiv (2023)

What do layers do, and what does depth mean?

References