Scaling laws
Principled design for model architectures.
What are scaling laws?
Scaling laws describe the functional relationship between two quantities that scale with each other over a significant interval. For example, volume has a cubic relationship with length (v = Kl³) and area varies as square of length (a = Kl²).
Scaling laws are crucial factors in physical systems design, especially microsystems like semiconductor devices and chips. Through the scaling laws a designer becomes aware of the physical consequences of downscaling or upscaling devices and systems.

How does scaling law apply to deep learning?
- Model developers experiment with various parameters to determine the optimal model architecture. These tunable parameters can be any hyperparameter to the task e.g., dataset size, model size (#parameters), batch size etc.
- To design a model (e.g.- transformers, recsys), understanding functional relationship between model quality (metric such as loss / test error / f1 score etc.) and tunable parameters can be extremely useful.
- Scaling laws are model architecture specific.
How are scaling laws determined?
- Studying the loss as a function of parameters like model size, data, and compute. For example, the authors of Chinchilla (from Deepmind) found that model size and training dataset should be scaled equally for large language models. Larger models remain under-trained if dataset sizes are not doubled.
“We investigate the optimal model and dataset size for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training 400 language models ranging from 70 million to 10 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the training dataset size should be scaled equally: for every doubling of model size the training dataset size should also be doubled.”
How to use scaling laws?
- As we move into the era of larger models, scaling laws will become a practical toolkit for ML developers to use resources optimally.
- Start your experiments to see if any scaling laws hold true for your large model training.