Subject 08 · Read last — it contextualizes the rest
History & Context
How the field got here: symbolic AI, the deep-learning revolution, the transformer moment, and the questions still open.
22 pages across 6 topics
Early ML & symbolic AI
Where it started.
- Symbolic AI Symbolic AI was the founding idea that intelligence is the manipulation of symbols by logical rules. For roughly thirty years after AI began in 1956, this was what 'AI' meant.
- Expert Systems An expert system captured a specialist's knowledge as if-then rules and used an inference engine to apply them. In the 1980s they were the first big business success for AI.
- Early Machine Learning A second, quieter tradition let programs learn from data instead of from hand-written rules. It started in the 1950s but spent decades in the shadow of symbolic AI.
- The AI Winters Twice the field promised more than it delivered, and funding collapsed. The deeper reason rules-based AI ran out of road is that hand-written rules cannot cover the real world.
The deep-learning revolution
When neural nets took over.
- The AlexNet moment In 2012 a deep convolutional neural network now called AlexNet won the ImageNet competition with a top-5 error rate of 15.3%, compared with 26.2% for the second-best entry. That gap convinced the field to switch to deep learning.
- Big data: ImageNet The deep-learning takeoff needed a large labeled dataset to learn from. ImageNet, built from 2009 onward, supplied millions of categorized images, and AlexNet trained on a 1.2-million-image subset across 1000 categories.
- Why GPUs mattered Deep networks had long been too expensive to train. AlexNet ran on two consumer GPUs and finished in five to six days, which is what made its depth practical.
- Backpropagation at scale The method that trains deep networks, backpropagation, was published in 1986. AlexNet's contribution was running that same idea on a far bigger network and far more data.
The transformer moment
2017 and after.
- Attention Is All You Need A 2017 paper called "Attention Is All You Need" introduced the Transformer, a network built only on attention mechanisms. It dropped the recurrence used by earlier models and trained faster while setting new translation records.
- Why it changed everything Because the Transformer trained in parallel rather than step by step, researchers could build much larger models on much more text. That ability to scale, more than any single trick, set off the modern era of language models.
- BERT and GPT In 2018, two papers took the Transformer in different directions. BERT read text in both directions to understand it, while GPT read left to right to generate it. Both pre-trained on huge amounts of unlabeled text first, then adapted to specific tasks.
The LLM & agent era
Scaling, chat, and agents.
- Scaling laws Researchers found that a model's error drops in a smooth, predictable way as you add parameters, data, and compute. That predictability is what justified spending more on ever-larger models.
- GPT-3 and few-shot learning GPT-3 showed that a single large model could pick up a new task from a few examples written into the prompt, with no retraining. That shifted how people used language models.
- ChatGPT and the chat era ChatGPT put a language model behind a simple chat box and tuned it to follow instructions and hold a conversation. It reached 100 million users in two months and set the format most people now expect.
- Tool-using agents Chat models can only describe doing something. A line of research taught them to call external tools and act on what comes back, the step that turned chatbots into agents.
Milestones timeline
The dates that mattered.
- Early foundations The field was named at a 1956 workshop, got its first trainable neural network with Rosenblatt's perceptron in 1958, and learned how to train deeper networks with the 1986 backpropagation paper.
- Deep learning breaks out Between 2009 and 2017 neural networks went from a niche idea to the dominant approach: a huge labeled dataset, a contest-winning image model, a Go champion, and the transformer architecture.
- The LLM era From 2020 onward, scaling the transformer produced GPT-3, then ChatGPT brought it to the public, GPT-4 added images, and frontier models turned toward coding and agents.
Open questions
What is still unsolved.
- Reliability Models still state false things with full confidence, and the best current methods reduce this without removing it. Whether it can ever be fully fixed is an open question.
- Real reasoning? Reasoning models that show their steps do better on many tasks, but researchers disagree on whether that is genuine reasoning or pattern-matching that collapses past a certain difficulty.
- Alignment Alignment is getting AI systems to do what we actually want. Current models are largely well-behaved; the unsolved part is how to supervise systems that grow more capable than their human overseers.
- The road ahead Two trends look strong: the length of tasks AI can do keeps doubling, and the cost of running it has collapsed. But reasoning limits suggest a ceiling, and serious people disagree on which signal wins.