Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUSOLVER (dense): cache workspace in fat handle #2465

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

bjarthur
Copy link
Contributor

riffing off of #2279 for getrf, getrs, sytrf, sytrs and friends. much cleaner API than #2464.

@bjarthur
Copy link
Contributor Author

tests pass locally but i have not battle tested this yet

@bjarthur bjarthur force-pushed the bja/fathandle branch 2 times, most recently from 54d62a4 to 8865fb6 Compare August 16, 2024 15:36
@bjarthur
Copy link
Contributor Author

ready for review. tests really pass locally now, and it works well in my application.

@maleadt
Copy link
Member

maleadt commented Aug 17, 2024

Superficially LGTM, but I don't have the time for a thorough review right now.

This probably should also integrate with the reclaim_hooks in order to wipe the caches when running out of memory (both here and in the other fat handles).
EDIT: let's move this to a separate issue.

Copy link
Member

@amontoison amontoison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjarthur I really like the idea of a "fat" handle!
We discussed about it with @maleadt regarding CUSPARSE, aiming to avoid allocating a new buffer for each sparse matrix product, but we never got the chance to implement it.

@maleadt, I’m wondering if CUDA.jl consistently reuses the same handle.
I know we can have up to 32 handles, but for efficiency, we should reuse the one that stored the buffer from the previous factorization.
Otherwise, we'll end up storing a lot of unnecessary workspaces.

If that’s not the case, wouldn’t it be better to implement a cache of workspaces where the "fat" handle could pick a workspace from the pool based on a specific "id" related to the routine?

@maleadt
Copy link
Member

maleadt commented Aug 19, 2024

I’m wondering if CUDA.jl consistently reuses the same handle.
I know we can have up to 32 handles, but for efficiency, we should reuse the one that stored the buffer from the previous factorization.
Otherwise, we'll end up storing a lot of unnecessary workspaces.

We do as long as you're using the same task, which I assume you are. In that case, calling handle() will always return the same object, and only a single handle will be cached.

@maleadt maleadt added enhancement New feature or request cuda libraries Stuff about CUDA library wrappers. labels Sep 18, 2024
@maleadt
Copy link
Member

maleadt commented Sep 18, 2024

Forgot about this; rebased to give it another CI run.

@maleadt maleadt changed the title cached workspaces via a fat handle for dense CUSOLVER CUSOLVER (dense): cache workspace in fat handle Sep 18, 2024
@maleadt maleadt merged commit 9231b97 into JuliaGPU:master Sep 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants