Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm support #295

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

ROCm support #295

wants to merge 20 commits into from

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Jun 19, 2024

As per title. Support AMD GPUs through TEI backend.

For now, only embedding model with cls/mean pooling is tested.

MI210/MI250/MI300 can dispatch on CK flash attention 2, but other GPUs will default to manual attention implem (or SDPA). Only bert looks to be supported in the python backend.

@fxmarty fxmarty mentioned this pull request Jun 19, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@OlivierDehaene OlivierDehaene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -0,0 +1,34 @@
## Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent to the integration snapshoting logic in router/tests?

model = AutoModel.from_pretrained(model_path).to(dtype).to(device)
self.hidden_size = model.config.hidden_size
self.pooling_mode = pooling_mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add mean pooling / CLS pooling forking L46?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLS pooling was already there, I added mean pooling.

What is L46?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierDehaene, I created a pr to fix this

backends/src/lib.rs Outdated Show resolved Hide resolved
router/src/lib.rs Outdated Show resolved Hide resolved
@almersawi
Copy link

any chance this MR is merged soon?

@fxmarty
Copy link
Contributor Author

fxmarty commented Jul 2, 2024

@almersawi it is in good shape to me.

cc @OlivierDehaene

@baddoub
Copy link

baddoub commented Jul 19, 2024

Hey guys! thanks for this MR. We have been waiting for it for some time now. Any idea when this MR  will be merged ?

@ocnimesh
Copy link

Can please anyone provide build steps for docker image with this branch.
I am getting bewlow error . Is there any pre-requisite packages?

/opt/conda/lib/python3.10/site-packages/torch/include/ATen/hip/HIPContextLight.h:20:10: fatal error: 'hipsolver/hipsolver.h' file not found
...
...
Dockerfile-rocm:112
112 | >>> RUN make -f Makefile-flash-att-v2 install-flash-attention-v2-rocm
...
...
ERROR: failed to solve: process "/bin/sh -c make -f Makefile-flash-att-v2 install-flash-attention-v2-rocm" did not complete successfully: exit code: 2

Command I executed : sudo docker build -f Dockerfile-rocm -t nims123/tei_amd .

@mht-sharma
Copy link

Applied minor fixes to successfully build the Docker image. PR #403

@nbroad1881 @OlivierDehaene

@sauravsit
Copy link

any ETA on this merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants