Why GPUs mattered — Physea Wiki

Deep networks had long been too expensive to train. AlexNet ran on two consumer GPUs and finished in five to six days, which is what made its depth practical.

A graphics processing unit, or GPU, was built to draw images by doing many simple arithmetic operations at once. It turns out that training a neural network is also mostly many simple operations done at once, so the same hardware fits the job well. By 2012 this match had become the deciding factor.

The AlexNet authors were direct about it. They wrote that convolutional networks had “still been prohibitively expensive to apply in large scale to high-resolution images,” and that current GPUs were “powerful enough to facilitate the training of interestingly-large CNNs.”^[1] They also said the depth of the model was important to its accuracy, and that depth was what the hardware made affordable.^[1]

The concrete numbers are modest by today’s standards. The network trained on two GTX 580 GPUs with 3 GB of memory each, and took between five and six days to finish.^[1] The authors noted that the network’s size was limited mainly by GPU memory and the training time they were willing to accept, and predicted results would improve with faster GPUs and bigger datasets.^[1] That prediction held up.

References

ImageNet Classification with Deep Convolutional Neural Networks — NeurIPS 2012 (Krizhevsky, Sutskever, Hinton)

How did GPUs make training a deep network possible?

References