💡 Lottery ticket hypothesis
A randomly-initialized, dense neural network contains a subnetwork (winning tickets) that is initialized such that when trained in isolation, it can match the test accuracy of the original network after training for at most the same number of iterations.
The winning tickets have won the initialization lottery: their connections have initial weights that make training particularly effective.
These winning tickets can be 10–100x smaller with minimal loss in quality.
One shot pruning :
In the paper [1], the authors present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations.
- The winning ticket is identified by training a network and pruning its smallest-magnitude weights.
- The remaining, unpruned connections constitute the architecture of the winning ticket. Each unpruned connection’s value is then reset to its initialization from original network before it was trained.
Algorithm :
- Randomly initialize a neural network N_initial.
- Train the network for j iterations, arriving at parameters N_j.
- Prune p% of the parameters in, creating a mask M.
- Reset the remaining parameters to their values in initial network N_initial, creating the winning ticket.
Implications of Lottery ticket hypothesis :
- Improve training performance : Since winning tickets can be trained from the start in isolation, there can be training mechanisms which can drastically reduce cost of training.
- Design better networks : Winning tickets reveal combinations of sparse architectures and initializations that are particularly adept at learning. These can lead to the design of new architectures and initialization schemes with the same properties that are conducive to learning.
To know more :
[1] : The Lottery ticket hypothesis : Finding Sparse Trainable Networks