RecSys model inference

1 min readDec 18, 2022

Industrial recommendation models (RecSys) are different from domains like NLP and CV in a few ways:

Massive amounts of user engagement data are readily available for training RecSys models. (e.g., ads clicks, search result page clicks, content browse). Most of the engagement data is sparse and categorical. Extensive training data leads to large model sizes and unique challenges in inference.
Model inference has to be low latency and high throughput as realtime recommendations are served to user at web-scale. Model compression to reduce model weight and improve serving performance has a tradeoff with lower revenue (due to loss in model accuracy). So, it is often not a choice.
User data is ever changing. To mitigate concept drift arising out of changing user behavior, model refresh should be near realtime.

image courtesy : Algoexpert

Here are a few deep-dives into various aspects of large scale model inference for recsys.

Topics:

Let me know if you want any more topics to be covered.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Machine Learning

Written by Jaideep Ray

Sr. Staff ML Engineer https://www.linkedin.com/in/jaideepray/

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams