Batch inference is about using data distributed processing infrastructure to carry out inference asynchronously on a large number of instances at once.
What to optimize: throughput, not latency-sensitive
End user: usually no direct interactions with a model. User interacts with the predictions stored in a data storage as a result of the batch jobs.
Validation: offline
Learn MLOps general concepts:
Next learn how to build and run pipelines for batch serving on Azure cloud:
- Orchestrate machine learning with pipelines on Azure
- Create Azure Machine Learning Pipeline
- Deploy batch inference pipelines with Azure Machine Learning
- Create a Batch Inferencing Service on Azure
or overall:
This workshop is WIP
It will cover a real-life use case of building, publishing, scheduling and troubleshooting Batch Serving pipelines on Azure with Python runtime.