PhyseaWiki How AI actually works Papers physea.ai →

Attention

Why do tokens need to look at each other?

Meaning is contextual: resolving a pronoun or tracking agreement needs information from elsewhere in the sentence. Self-attention relates every position to every other directly, so distant words influence each other in one step.

Last updated 2026-06-15 · Physea Labs

Meaning is contextual. Resolving a pronoun, tracking subject-verb agreement, or disambiguating a word all need information from elsewhere in the sentence. Self-attention is, in the paper’s definition, “an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.”[1] Because every token attends to every other directly, distant words influence each other in a single step.

“The animal didn’t cross the street because it was tired” The animal didn’t cross the street because it strongest attention
Self-attention lets every token look at every other token at once. The word "it" attends most to the noun it refers to.

The canonical example comes from the transformer’s own authors: in “the animal didn’t cross the street because it was tired,” the word “it” should attend to the noun it refers to, and attention learns to route that connection.[2]

References

  1. Attention Is All You Need (arXiv:1706.03762) — Vaswani et al., Google
  2. Transformer: A Novel Neural Network Architecture for Language Understanding — Google Research Blog