Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.
- Install Redis
pip3 install -r requriments.txt
-
For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.
nvidia-cuda-mps-control -d # To start # To exit mps after stoping the server do. nvidia-cuda-mps-control # Will enter the command prompt quit # enter command to quit
-
Start Redis
redis-server --save "" --appendonly no
-
Start Batch-Serving
supervisord -c supervisor.conf # Start 3 workers on a single gpu
-
Start Batch benchmark
python3 bench_batched.py