π DLRM
What is DLRM ?
- DLRM is Deep Learning Recommendation Model. These are used in several important areas : search ranking, feed ranking, ads/video/content recommendation etc. DLRMs have unique challenges for training & inference. Due to the wide usage of this class of models and these challenges it makes sense to study these.
How are these different from other models say CV ?

- Content Understanding models are usually smaller and compute intensive (high arithmetic intensity) whereas DLRMs are huge and need high capacity and bandwidth memory than compute. DLRMs can benefit immensely from cache like structures as a large part of compute is lookup in embedding tables.
( Memory capacity : Amount of memory available, Memory Bandwidth : Rate at which data can be accessed from memory)
What are the primary components of DLRM ?

- Embeddings to represent categorical features. (sparse)
- Bottom MLP to encode continuous features. (dense NN)
- Interaction layer.
- Top layer MLP for output.
As shown in the diagram above, the DLRM is composed of compute-dominated MLPs as well as memory capacity limited embeddings. It is therefore natural to rely on data parallelism to improve performance of MLPs and model parallelism to address the memory capacity requirements of embeddings. This is different from Content Understanding models where Data parallelism can improve performance.
Reference :
https://ai.facebook.com/blog/dlrm-an-advanced-open-source-deep-learning-recommendation-model/