RecSys model inference
1 min readDec 18, 2022
Industrial recommendation models (RecSys) are different from domains like NLP and CV in a few ways:
- Massive amounts of user engagement data are readily available for training RecSys models. (e.g., ads clicks, search result page clicks, content browse). Most of the engagement data is sparse and categorical. Extensive training data leads to large model sizes and unique challenges in inference.
- Model inference has to be low latency and high throughput as realtime recommendations are served to user at web-scale. Model compression to reduce model weight and improve serving performance has a tradeoff with lower revenue (due to loss in model accuracy). So, it is often not a choice.
- User data is ever changing. To mitigate concept drift arising out of changing user behavior, model refresh should be near realtime.
Here are a few deep-dives into various aspects of large scale model inference for recsys.
Topics:
Let me know if you want any more topics to be covered.