Releases: runpod-workers/worker-vllm
Releases · runpod-workers/worker-vllm
v1.6.0
v1.5.0
- vllm version update 0.6.1 --> 0.6.2.
- Supports llama 3.2 Models.
v1.4.0: Merge pull request #109 from runpod-workers/0.5.5-update
vllm version update 0.5.5 --> 0.6.1
v1.3.1
vLLm version: 0.5.5
- OpenAI Completion Requests Bug fix.
v1.3.0
Version upgrade from vllm v0.5.4 -> v0.5.5
Various improvements and bug fixes.
[Known Issue]: OpenAI Completion Requests error.
v1.2.0
Version upgrade from vllm v0.5.3 -> v0.5.4
- Various improvements and bug fixes.
- [Known Issue]: OpenAI Completion Requests error.
v1.1.0
- Major update from vllm v0.4.2 -> v0.5.3.
- supports Llama 3.1 version models.
- Various improvements and bug fixes.
[Known Issue]: OpenAI Completion Requests error.
1.0.1
Hotfix adding backwards compatibility for deprecated max_context_len_to_capture engine argument
1.0.0
Worker vLLM 1.0.0 - What's Changed
- vLLM version 0.3.3 -> 0.4.2
- Various improvements and bug fixes
0.3.2
Worker vLLM 0.3.2 - What's Changed
- vLLM version 0.3.2 -> 0.3.3
- StarCoder2 support
- Performance optimization for Gemma
- 2/3/8-bit GPTQ support
- Integrate Marlin Kernels for Int4 GPTQ inference
- Performance optimization for MoE kernel
- Updated and refactored base image, sampling parameters, etc.
- Various bug fixes