0.3.2

alpayariyak released this 12 Mar 23:11

· 87 commits to main since this release

Worker vLLM 0.3.2 - What's Changed

vLLM version 0.3.2 -> 0.3.3
- StarCoder2 support
- Performance optimization for Gemma
- 2/3/8-bit GPTQ support
- Integrate Marlin Kernels for Int4 GPTQ inference
- Performance optimization for MoE kernel
Updated and refactored base image, sampling parameters, etc.
Various bug fixes

Assets 2