AlexNet won ImageNet competition by a huge margin, triggering the deep learning revolution
Context: At the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), students Alex Krizhevsky and Ilya Sutskever under Hinton's supervision developed AlexNet, an 8-layer deep convolutional neural network trained on GPUs. The best methods at the time (traditional computer vision features + SVM) had error rates around 26%.
Decision: Used GPU parallel computing to train deep convolutional networks, combining ReLU activation functions, Dropout regularization, and data augmentation to build the deepest convolutional neural network of the time and enter the competition.
Reasoning: GPU parallel computing made training large neural networks feasible in time. ReLU replacing Sigmoid solved the vanishing gradient problem. Dropout prevented overfitting. These three engineering innovations combined to give deep networks their first overwhelming advantage on large-scale visual tasks.
Outcome: AlexNet won with a 15.3% error rate, 10.8 percentage points lower than second place—an unprecedented margin in competition history. This result shocked the entire computer vision and machine learning community, triggering a full-scale explosion of deep learning; Google, Facebook, Microsoft, and other tech giants immediately invested heavily in deep learning research.
Lesson: Technology revolutions are often triggered by the synergy of multiple engineering innovations, not a single breakthrough. The combination of GPU+ReLU+Dropout+big data—each insufficient alone to trigger a revolution—combined to produce a qualitative change. Timing is as important as technology.
mm-hinton-dropout-regularizationmm-hinton-hierarchical-representation