💡Model Inference Server

What about it ?

Jaideep Ray
2 min readJan 29, 2022
  1. Models are getting bigger. (Model size now range from MBs to TBs). Innovation in Deep learning is rapid. More and more use-cases are using models in their execution path.
  2. Model execution has seen great benefits from heterogenous hardware backends (accelerator): CPUs, GPUs, FPGA
  3. Models are used in critical services and at large scale (even million qps): model deployment is a multi tenant problem. Routing requests efficiently to servers is a necessity.

Keeping these in mind at production scale, there is a genuine need for model inference server. It must be able to support fast model validation, deployment, and proper version control.

Architecture & Component responsibilities :

Model inference server architecture
  1. Model Master : Singleton orchestrator which deploys model to one or many model servers by factoring in model requirements and hardware resources. For large ads model, it might not be possible to deploy model to a single model server. In that case the model is transformed (split) and deployed to multiple model servers. Model master can perform fast model validation and version control.
  2. Model Server : It consists of Model Loader (within Model Loader), Model containers and Inference request executor.

By isolating model instances in containers, you ensure that they don’t interfere with each other.

Reference :

  1. Deep Learning Inference Service at Microsoft

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response