PhyseaWiki How AI actually works Papers physea.ai →

The transformer

What are the three families of transformer models?

The original transformer was an encoder-decoder built for translation. Later models specialized into three shapes: encoder-only for understanding text, decoder-only for generating it, and the full encoder-decoder for translation-style tasks. Today's chatbots are overwhelmingly decoder-only.

Last updated 2026-06-15 · Physea Labs

The original transformer was an encoder-decoder built for translation. Later models specialized the design into three shapes:[1]

  • Encoder-only (such as BERT): bidirectional, tuned for understanding text.
  • Decoder-only (such as GPT): each token sees only earlier tokens, tuned for generating text one token at a time.
  • Encoder-decoder: the full original, for translation-style tasks.

Today’s generative chatbots are overwhelmingly decoder-only transformers, using masked self-attention so each position attends only to the words before it, which is what makes next-token prediction work.

References

  1. Transformer (deep learning architecture) — Wikipedia