This project provides a lightweight vllm install using the civo-python-cuda-images.
Also included is an example helm deployment of the Vllm Server compatible with civo-gpu-operator-tf repository.
Name | Description |
---|---|
civo-vllm-docker |
a slim vllm docker image |
- further examples
- flash_attention2 support
- improvements to the entrypoint