Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.6.1
->==0.6.2
Release Notes
vllm-project/vllm (vllm)
v0.6.2
Compare Source
Highlights
Model Support
Support Llama 3.2 models (#8811, #8822)
vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16
Beam search have been soft deprecated. We are moving towards a version of beam search that's more performant and also simplifying vLLM's core. (#8684, #8763, #8713)
Support for Solar Model (#8386), minicpm3 (#8297), LLaVA-Onevision model support (#8486)
Enhancements: pp for qwen2-vl (#8696), multiple images for qwen-vl (#8247), mistral function calling (#8515), bitsandbytes support for Gemma2 (#8338), tensor parallelism with bitsandbytes quantization (#8434)
Hardware Support
Production Engine
Performance
MQLLMEngine
for API Server, boost throughput 30% in single step and 7% in multistep (#8157, #8761, #8584)Others
What's Changed
IQ1_M
quantization implementation to GGUF kernel by @Isotr0py in https://github.com/vllm-project/vllm/pull/8357MQLLMEngine
to avoidasyncio
OH by @alexm-neuralmagic in https://github.com/vllm-project/vllm/pull/8157dead_error
property to engine client by @joerunde in https://github.com/vllm-project/vllm/pull/8574collect_env.py
by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/8649PromptInputs
toPromptType
, andinputs
toprompt
by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/8673SequenceData
andSequence
by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/8675SequenceData.from_token_counts
to create dummy data by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/8687PromptInputs
toPromptType
, andinputs
toprompt
" by @simon-mo in https://github.com/vllm-project/vllm/pull/8750replace_parameters
by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/8748PromptInputs
andinputs
, with backwards compatibility by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/8760New Contributors
Full Changelog: vllm-project/vllm@v0.6.1...v0.6.2
v0.6.1.post2
Compare Source
Highlights
What's Changed
Full Changelog: vllm-project/vllm@v0.6.1.post1...v0.6.1.post2
v0.6.1.post1
Compare Source
Highlights
This release features important bug fixes and enhancements for
--max_num_batched_tokens 16384
with--max-model-len 16384
Also
engine_use_ray
(#8126)What's Changed
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.