The AlexNet moment — Physea Wiki

In 2012 a deep convolutional neural network now called AlexNet won the ImageNet competition with a top-5 error rate of 15.3%, compared with 26.2% for the second-best entry. That gap convinced the field to switch to deep learning.

People often point to one event as the start of the modern deep-learning era: the 2012 ImageNet competition. A deep neural network built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, now usually called AlexNet, won by a wide margin. It scored a top-5 test error rate of 15.3%, compared with 26.2% for the second-best entry.^[1] A “top-5 error” counts an image as wrong only when the correct label is not among the model’s five most probable guesses.^[1]

The size of that gap is what made the result hard to ignore. Most of the strong entries in those years used hand-built feature extractors followed by a classifier. AlexNet instead learned everything from the raw pixels with a single deep network of eight learned layers: five convolutional and three fully connected, ending in a 1000-way output.^[1] The network had about 60 million parameters and 650,000 neurons.^[1]

The win did not come from one clever trick. It came from three older ideas finally arriving at the same time at usable scale: a large labeled dataset, graphics hardware fast enough to train a big network, and the back-propagation learning method run on a far larger model than before. The rest of this topic takes those one at a time.

Why it mattered Within a few years, nearly every serious computer-vision system had switched to deep neural networks. The same recipe later spread to speech and language.

References

ImageNet Classification with Deep Convolutional Neural Networks — NeurIPS 2012 (Krizhevsky, Sutskever, Hinton)

What happened at the 2012 ImageNet competition?

References