Check your segment metrics !
2 min readJun 22, 2022
- ML models are increasingly used in applications impacting a diverse and large population. The models are evaluated based on their primary metric performance on the whole population.
- For example, a recommendation model evaluation is based on an offline metric such as normalized entropy, ndcg computed on human judgements and online metrics such as lift in engagement (views, clicks, time spent etc) in product.
- ML models often make systematic errors on important segments even though the overall metrics is great. These are blind spots in the model and can cause severe issues in production without proper mitigation.
Segment metrics :
- A segment is a subset of a population which share a defining attribute or characteristic. Examples of segment variables are country, region, language, device, income groups etc are used for defining such segment. These are also termed as stratification variables.
- If underperforming segments can be accurately identified and labeled, we can then improve model robustness by improving data or modeling. Even if we can’t improve model robustness, knowing these issues will help in preventing taking wrong decisions on predictions for these segments.
Detecting segments :
- Stratification variables for segmenting population is often intuitive. For example, demographic features such as country.
- You can use some of the top features in your model as segment. Partition the data based on such features and analyze your model performance on each segment.
- Stratification variables such as demographic features should be very carefully used (preferably not used) in models to avoid potential bias.
What to do with segment metrics ?
- In case you detect a segment with significantly different metrics from other segments, it is important to deep dive and see what’s happening !
- There are several ways to improve poorly performing segments. Data augmentation to improve data quality or trying out a separate model architecture for the segment are some methods practitioners often try out.