Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency vllm to v0.6.2 #7

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Oct 1, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
vllm ==0.6.1 -> ==0.6.2 age adoption passing confidence

Release Notes

vllm-project/vllm (vllm)

v0.6.2

Compare Source

Highlights

Model Support
  • Support Llama 3.2 models (#​8811, #​8822)
    vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16

  • Beam search have been soft deprecated. We are moving towards a version of beam search that's more performant and also simplifying vLLM's core. (#​8684, #​8763, #​8713)

    • ⚠️ You will see the following error now, this is breaking change!

      Using beam search as a sampling parameter is deprecated, and will be removed in the future release. Please use the vllm.LLM.use_beam_search method for dedicated beam search instead, or set the environment variable VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 to suppress this error. For more details, see https://github.com/vllm-project/vllm/issues/8306

  • Support for Solar Model (#​8386), minicpm3 (#​8297), LLaVA-Onevision model support (#​8486)

  • Enhancements: pp for qwen2-vl (#​8696), multiple images for qwen-vl (#​8247), mistral function calling (#​8515), bitsandbytes support for Gemma2 (#​8338), tensor parallelism with bitsandbytes quantization (#​8434)

Hardware Support
  • TPU: implement multi-step scheduling (#​8489), use Ray for default distributed backend (#​8389)
  • CPU: Enable mrope and support Qwen2-VL on CPU backend (#​8770)
  • AMD: custom paged attention kernel for rocm (#​8310), and fp8 kv cache support (#​8577)
Production Engine
  • Initial support for priority sheduling (#​5958)
  • Support Lora lineage and base model metadata management (#​6315)
  • Batch inference for llm.chat() API (#​8648)
Performance
  • Introduce MQLLMEngine for API Server, boost throughput 30% in single step and 7% in multistep (#​8157, #​8761, #​8584)
  • Multi-step scheduling enhancements
    • Prompt logprobs support in Multi-step (#​8199)
    • Add output streaming support to multi-step + async (#​8335)
    • Add flashinfer backend (#​7928)
  • Add cuda graph support during decoding for encoder-decoder models (#​7631)
Others

What's Changed

New Contributors

Full Changelog: vllm-project/vllm@v0.6.1...v0.6.2

v0.6.1.post2

Compare Source

Highlights

  • This release contains an important bugfix related to token streaming combined with stop string (#​8468)

What's Changed

Full Changelog: vllm-project/vllm@v0.6.1.post1...v0.6.1.post2

v0.6.1.post1

Compare Source

Highlights

This release features important bug fixes and enhancements for

Also

  • support multiple images for qwen-vl (#​8247)
  • removes engine_use_ray (#​8126)
  • add engine option to return only deltas or final output (#​7381)
  • add bitsandbytes support for Gemma2 (#​8338)

What's Changed


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/vllm-0.x branch 3 times, most recently from 208ae21 to 7da37ec Compare October 8, 2024 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants