AI, ML & LLMs
What actually makes a large language model work?
A large language model is trained to do one thing: predict the next word in a stretch of text. Repeated over enormous amounts of text, that simple objective is what produces its broad, language-fluent behavior.
Underneath the chat window, a large language model does something that sounds almost too simple: it predicts the next word. It reads the text so far, works out what is most likely to come next, and produces that.[1] When you ask it a question, it is not looking up an answer. It is continuing your text in the way its training suggests is most probable.
The model learns this skill by reading. It is shown an enormous amount of text and, over and over, tries to guess the next word, checks its guess against what the text actually said, and adjusts itself to do better. This is called self-supervised pre-training, because the text supplies its own answers and no human has to label anything.[1] No definitive cutoff makes a model “large”; the word just signals that these systems are trained on a lot of text with a lot of internal settings.[2]
The surprising part is how much this one trick yields. A system trained only to predict the next word ends up able to answer questions, draft emails, write code, and translate, because doing all of those well is, in the end, a matter of producing the right next words. That is also why an LLM can sound confident and still be wrong: it is built to produce likely text, which is not the same as true text.
Well-known large language model families
- Claude (Anthropic) ↗
A family of LLMs offered as an AI assistant for writing, coding, and analysis.
- Gemini (Google DeepMind) ↗
Google's family of frontier AI models, used in its Gemini assistant.
- Llama (Meta) ↗
An open-weight LLM family you can download and run yourself.
References
- The Surprising Power of Next Word Prediction: Large Language Models Explained, Part 1 — Center for Security and Emerging Technology, Georgetown
- Large language model — Wikipedia