0.3.2
Worker vLLM 0.3.2 - What's Changed
- vLLM version 0.3.2 -> 0.3.3
- StarCoder2 support
- Performance optimization for Gemma
- 2/3/8-bit GPTQ support
- Integrate Marlin Kernels for Int4 GPTQ inference
- Performance optimization for MoE kernel
- Updated and refactored base image, sampling parameters, etc.
- Various bug fixes