Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(gpu): rework select to avoid using local streams #1867

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

agnesLeroy
Copy link
Contributor

closes: please link all relevant issues

PR content/description

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

@cla-bot cla-bot bot added the cla-signed label Dec 13, 2024
@agnesLeroy agnesLeroy requested a review from pdroalves December 13, 2024 12:53
@agnesLeroy
Copy link
Contributor Author

Latency on H100 is slightly improved. I need to check the effect on the whitepaper ERC20 transfer throughput.

@agnesLeroy agnesLeroy force-pushed the al/rework_if_then_else branch 3 times, most recently from 8e06cac to a4a5e0e Compare December 13, 2024 17:12
@agnesLeroy agnesLeroy requested review from guillermo-oyarzun and removed request for pdroalves December 16, 2024 09:01
@agnesLeroy agnesLeroy force-pushed the al/rework_if_then_else branch from a4a5e0e to e490498 Compare December 16, 2024 09:02
Copy link
Member

@guillermo-oyarzun guillermo-oyarzun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this improves the issues in the multi-gpu?

@agnesLeroy
Copy link
Contributor Author

Nope, it doesn't 😞

@agnesLeroy
Copy link
Contributor Author

agnesLeroy commented Dec 16, 2024

I think the multi-GPU throughput issue is related to the use of cudaDeviceSynchronize in drop for CudaVec, I haven't tried to check though. Disabling that synchronization may lead to memory errors... I could try nevertheless to confirm it has an effect 🤔

@guillermo-oyarzun
Copy link
Member

I think the multi-GPU throughput issue is related to the use of cudaDeviceSynchronize in drop for CudaVec, I haven't tried to check though. Disabling that synchronization may lead to memory errors... I could try nevertheless to confirm it has an effect 🤔

Confirmation would be nice, even though changing that safely might involve a big refactor.

@agnesLeroy
Copy link
Contributor Author

Hmm looks like it's not that: https://github.com/zama-ai/tfhe-rs/actions/runs/12349715463/job/34461608896. I disabled cudaDeviceSynchronize in drop there but still we see the same effect. We need to investigate further.

@agnesLeroy agnesLeroy merged commit e9c901b into main Dec 16, 2024
100 of 106 checks passed
@agnesLeroy agnesLeroy deleted the al/rework_if_then_else branch December 16, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants