Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 06 Jun 22:04
· 1983 commits to main since this release
68cdb95

What's Changed

  • Features integration without fp8 by @gshtras in #7
  • Layernorm optimizations by @mawong-amd in #8
  • Bringing in the latest commits from upstream by @mawong-amd in #9
  • Bump Docker to ROCm 6.1, add gradlib for tuned gemm, include RCCL fixes by @mawong-amd in #12
  • add mi300 fused_moe tuned configs by @divakar-amd in #13
  • Correctly calculating the same value for the required cache blocks num for all torchrun processes by @gshtras in #15
  • [ROCm] adding a missing triton autotune config by @hongxiayang in #17
  • make the vllm setup mode configurable and make install mode as defaul… by @hongxiayang in #18
  • enable fused topK_softmax kernel for hip by @divakar-amd in #14
  • Fix ambiguous fma call by @cjatin in #16
  • Rccl dockerfile updates by @mawong-amd in #19
  • Dockerfile improvements: multistage by @mawong-amd in #20
  • Integrate PagedAttention Optimization custom kernel into vLLM by @lcskrishna in #22
  • Updates to custom PagedAttention for supporting context len upto 32k. by @lcskrishna in #25
  • Update max_context_len for custom paged attention. by @lcskrishna in #26
  • Update RCCL, hipBLASLt, base image in Dockerfile.rocm by @shajrawi in #24
  • Adding fp8 gemm computation by @charlifu in #29
  • fix the model loading fp8 by @charlifu in #30
  • Update linear.py by @gshtras in #32
  • Update base docker image with Pytorch 2.3 by @charlifu in #35

New Contributors

Full Changelog: v0.3.3...v0.4.0