The deep-learning revolution
What happened at the 2012 ImageNet competition?
In 2012 a deep convolutional neural network now called AlexNet won the ImageNet competition with a top-5 error rate of 15.3%, compared with 26.2% for the second-best entry. That gap convinced the field to switch to deep learning.
People often point to one event as the start of the modern deep-learning era: the 2012 ImageNet competition. A deep neural network built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, now usually called AlexNet, won by a wide margin. It scored a top-5 test error rate of 15.3%, compared with 26.2% for the second-best entry.[1] A “top-5 error” counts an image as wrong only when the correct label is not among the model’s five most probable guesses.[1]
The size of that gap is what made the result hard to ignore. Most of the strong entries in those years used hand-built feature extractors followed by a classifier. AlexNet instead learned everything from the raw pixels with a single deep network of eight learned layers: five convolutional and three fully connected, ending in a 1000-way output.[1] The network had about 60 million parameters and 650,000 neurons.[1]
The win did not come from one clever trick. It came from three older ideas finally arriving at the same time at usable scale: a large labeled dataset, graphics hardware fast enough to train a big network, and the back-propagation learning method run on a far larger model than before. The rest of this topic takes those one at a time.
References
- ImageNet Classification with Deep Convolutional Neural Networks — NeurIPS 2012 (Krizhevsky, Sutskever, Hinton)