Subject 01 · Start here — no prerequisites
Foundations
What AI, machine learning, and language models actually are. Tokens, training versus inference, and the vocabulary every later subject assumes.
22 pages across 6 topics
AI, ML & LLMs
What the words mean and how they nest.
- What the words mean AI is the broad goal of getting machines to do things we associate with intelligence. Machine learning is one way to get there, by learning patterns from data. A large language model is a machine-learning system trained on text.
- How they nest These terms are nested, not parallel. AI is the outer ring, machine learning sits inside it, deep learning inside that, and large language models inside deep learning. All ML is AI, but not all AI is ML.
- What makes an LLM A large language model is trained to do one thing: predict the next word in a stretch of text. Repeated over enormous amounts of text, that simple objective is what produces its broad, language-fluent behavior.
Tokens & tokenization
How models chop text into pieces.
- What is a token? A token is the small unit of text a language model reads and writes. It can be a whole word, part of a word, a single character, or even a byte, and your text is converted into tokens before the model sees it.
- How text becomes tokens Most models use byte pair encoding (BPE). It starts from single characters and repeatedly merges the most frequent neighboring pairs into longer pieces, building a fixed vocabulary. The result is reversible and works on any text.
- Why tokens, not words Whole-word vocabularies break on rare words, new words, and other languages. Subword tokens let a model spell out anything from familiar pieces, so it can handle words it has never seen before.
- Cost and context You are billed per token and the context window is measured in tokens, so token count drives both cost and how much text fits. Token counts also differ between models, so the same text can cost more on a different tokenizer.
Training vs inference
Learning the weights versus using them.
- Two modes Training is when a model learns: it adjusts its internal numbers, called weights, by measuring its errors on data. Inference is when you use the finished model on new input, with the weights held fixed.
- Pretraining Pretraining is the first and largest training stage. The model learns to predict the next word across enormous amounts of text, which it can do without any human labels because the text supplies its own answers.
- Fine-tuning and RLHF After pretraining, a base model is shaped with two more steps: fine-tuning on good example answers, then learning from human feedback on which answers people prefer. This is what turns a next-word predictor into a useful assistant.
- Does it learn from me? A model does not learn from your chat. During inference its weights are frozen. It can adapt to examples you put in the prompt, but that adaptation lasts only for that one request.
Neural networks
The structure underneath it all.
- The parts A neural network is a set of connected units called artificial neurons. Each connection carries a weight, each neuron adds up its weighted inputs and passes the result through a function, and the neurons are arranged in layers from input to output.
- How it learns A neural network learns by adjusting its weights. It makes a prediction, measures the error against the right answer, then sends that error backward through the network to nudge each weight toward a better result, and repeats this over many examples.
- The brain analogy The name comes from a loose inspiration: artificial neurons were modeled, very roughly, on brain cells. But the resemblance is shallow, and researchers caution against reading a working brain into a neural network.
Capabilities & limits
What models do well and where they break.
- What AI does well Today's AI is strong at working with language: drafting, summarizing, translating, answering, and writing code. Knowing the strengths makes the failures easier to spot, because they are specific, not random.
- Hallucination A hallucination is a fluent, confident statement that is simply wrong. It happens because the way models are trained and graded rewards guessing over saying 'I don't know.'
- No real-time knowledge A model learns from a fixed snapshot of data. Everything after its knowledge cutoff is invisible to it, unless it can search the web or read documents you provide.
- Understanding and math A model arranges language by statistical pattern, not by grasping meaning. The same pattern-matching is why it fumbles exact tasks like counting letters or doing arithmetic.
Key terms
A plain-language glossary.
- Model and parameters A model is a trained program that turns input into output. Its parameters are the billions of numbers it learned, and training is the process that adjusted those numbers until the model's predictions got good.
- Prompt, token, context A prompt is the text you give a model. The model reads text as tokens, small chunks roughly three-quarters of a word each. The context window is the maximum number of tokens it can work with at one time.
- Inference and fine-tuning Inference is what happens every time you use a model: it runs forward on new input with its weights frozen. Fine-tuning is extra training afterward that adjusts those weights to specialize the model for a task.
- Embedding and more An embedding represents text as a vector of numbers, arranged so similar meanings land near each other. This page also gives short, sourced definitions for a few other terms a newcomer meets often.