Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ROCM libs and improvements #2358

Closed
wants to merge 24 commits into from

Conversation

mht-sharma
Copy link
Collaborator

@mht-sharma mht-sharma commented Aug 5, 2024

What does this PR do?

This PR introduces various library updates to address breaking changes, including optimisations for ROCm and custom kernels for low-batch-size GEMM and Paged attention. Key improvements are as follows:

  • Update CK flash attention to use CK tile
  • Update VLLM to latest rocm/vllm commit.
  • Update torch
  • Fix tunable op issue with TP8
  • BF16 inference fix
  • Custom Paged attention
  • Documentation

@ErikKaum
Copy link
Member

Hi @mht-sharma 👋

Just checking in on this: are you still working on it or is this something we should consider closed? And my intention is by no means to say that we're in a hurry 👍

@mht-sharma
Copy link
Collaborator Author

Hi @mht-sharma 👋

Just checking in on this: are you still working on it or is this something we should consider closed? And my intention is by no means to say that we're in a hurry 👍

Hi @ErikKaum , yes I am currently working on this, with a few improvements and fixes still pending. I am working with AMD to ensure these updates are finalized soon.

@mht-sharma mht-sharma changed the title WIP: Update ROCM libs WIP: Update ROCM libs and improvements Sep 4, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mht-sharma mht-sharma marked this pull request as ready for review September 13, 2024 08:24
@mht-sharma mht-sharma changed the title WIP: Update ROCM libs and improvements Update ROCM libs and improvements Sep 13, 2024
Comment on lines -1124 to 1132
torch.empty(
torch.zeros(
(num_blocks, num_heads, head_size // x, BLOCK_SIZE, x),
dtype=dtype,
device=device,
),
torch.empty(
torch.zeros(
(num_blocks, num_heads, head_size, BLOCK_SIZE),
dtype=dtype,
device=device,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required for custom PA kernel in ROCM.

@mht-sharma
Copy link
Collaborator Author

@OlivierDehaene @Narsil could you please review the PR and merge

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really awesome to see ROCm get up to speed again.

Added a bunch of comments, most of them smaller nitpicks.

Dockerfile_amd Outdated Show resolved Hide resolved
server/text_generation_server/layers/attention/rocm.py Outdated Show resolved Hide resolved
server/text_generation_server/layers/attention/rocm.py Outdated Show resolved Hide resolved
server/text_generation_server/layers/attention/rocm.py Outdated Show resolved Hide resolved
server/text_generation_server/layers/linear.py Outdated Show resolved Hide resolved
server/text_generation_server/layers/moe/fused_moe_rocm.py Outdated Show resolved Hide resolved
server/text_generation_server/models/globals.py Outdated Show resolved Hide resolved
@mht-sharma
Copy link
Collaborator Author

Closing this in favour of #2579

@mht-sharma mht-sharma closed this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants