Big data: ImageNet — Physea Wiki

The deep-learning takeoff needed a large labeled dataset to learn from. ImageNet, built from 2009 onward, supplied millions of categorized images, and AlexNet trained on a 1.2-million-image subset across 1000 categories.

A learning method is only as good as what it can learn from. Before 2012, most labeled image collections held tens of thousands of pictures, which was too few to train a large network without it simply memorizing. The dataset that changed this was ImageNet, introduced in 2009. The paper describes more than 3.2 million high-resolution images organized by the nouns of WordNet, a lexical database that arranges concepts into a hierarchy.^[1] Human workers labeled the images, so each one came with a trustworthy category.

ImageNet kept growing, and the AlexNet paper describes a collection of over 15 million labeled images in roughly 22,000 categories.^[2] Training and evaluation used a standardized subset from the annual ImageNet competition: roughly 1.2 million training images, 50,000 for validation, and 150,000 for testing, across 1000 categories.^[2]

This scale is what let a 60-million-parameter network learn without badly overfitting. A small dataset would have let the network memorize answers; a large and varied one forces it to find patterns that hold up on images it has never seen.

References

ImageNet: A Large-Scale Hierarchical Image Database — CVPR 2009 (Deng, Dong, Socher, Li, Li, Fei-Fei)
ImageNet Classification with Deep Convolutional Neural Networks — NeurIPS 2012 (Krizhevsky, Sutskever, Hinton)

Where did the training data for the deep-learning takeoff come from?

References