CUSOLVER (dense): cache workspace in fat handle #2465

bjarthur · 2024-08-15T22:28:11Z

riffing off of #2279 for getrf, getrs, sytrf, sytrs and friends. much cleaner API than #2464.

bjarthur · 2024-08-15T22:28:56Z

tests pass locally but i have not battle tested this yet

bjarthur · 2024-08-16T15:45:43Z

ready for review. tests really pass locally now, and it works well in my application.

maleadt · 2024-08-17T15:33:40Z

Superficially LGTM, but I don't have the time for a thorough review right now.

This probably should also integrate with the reclaim_hooks in order to wipe the caches when running out of memory (both here and in the other fat handles).
EDIT: let's move this to a separate issue.

amontoison

@bjarthur I really like the idea of a "fat" handle!
We discussed about it with @maleadt regarding CUSPARSE, aiming to avoid allocating a new buffer for each sparse matrix product, but we never got the chance to implement it.

@maleadt, I’m wondering if CUDA.jl consistently reuses the same handle.
I know we can have up to 32 handles, but for efficiency, we should reuse the one that stored the buffer from the previous factorization.
Otherwise, we'll end up storing a lot of unnecessary workspaces.

If that’s not the case, wouldn’t it be better to implement a cache of workspaces where the "fat" handle could pick a workspace from the pool based on a specific "id" related to the routine?

maleadt · 2024-08-19T07:27:14Z

I’m wondering if CUDA.jl consistently reuses the same handle.
I know we can have up to 32 handles, but for efficiency, we should reuse the one that stored the buffer from the previous factorization.
Otherwise, we'll end up storing a lot of unnecessary workspaces.

We do as long as you're using the same task, which I assume you are. In that case, calling handle() will always return the same object, and only a single handle will be cached.

maleadt · 2024-09-18T08:38:59Z

Forgot about this; rebased to give it another CI run.

bjarthur mentioned this pull request Aug 15, 2024

pre-allocated workspaces and devinfo for getrf, getrs, sytrf, sytrs #2464

Closed

bjarthur force-pushed the bja/fathandle branch 2 times, most recently from 54d62a4 to 8865fb6 Compare August 16, 2024 15:36

amontoison reviewed Aug 17, 2024

View reviewed changes

maleadt mentioned this pull request Aug 19, 2024

Consider reclaiming memory of active handles #2468

Open

cached workspaces via a fat handle for CUSOLVER/dense{_generic}

93d970d

maleadt force-pushed the bja/fathandle branch from 8865fb6 to 93d970d Compare September 18, 2024 08:38

maleadt added enhancement New feature or request cuda libraries Stuff about CUDA library wrappers. labels Sep 18, 2024

maleadt changed the title ~~cached workspaces via a fat handle for dense CUSOLVER~~ CUSOLVER (dense): cache workspace in fat handle Sep 18, 2024

maleadt merged commit 9231b97 into JuliaGPU:master Sep 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUSOLVER (dense): cache workspace in fat handle #2465

CUSOLVER (dense): cache workspace in fat handle #2465

bjarthur commented Aug 15, 2024

bjarthur commented Aug 15, 2024

bjarthur commented Aug 16, 2024

maleadt commented Aug 17, 2024 •

edited

Loading

amontoison left a comment

maleadt commented Aug 19, 2024 •

edited

Loading

maleadt commented Sep 18, 2024 •

edited

Loading

CUSOLVER (dense): cache workspace in fat handle #2465

CUSOLVER (dense): cache workspace in fat handle #2465

Conversation

bjarthur commented Aug 15, 2024

bjarthur commented Aug 15, 2024

bjarthur commented Aug 16, 2024

maleadt commented Aug 17, 2024 • edited Loading

amontoison left a comment

Choose a reason for hiding this comment

maleadt commented Aug 19, 2024 • edited Loading

maleadt commented Sep 18, 2024 • edited Loading

maleadt commented Aug 17, 2024 •

edited

Loading

maleadt commented Aug 19, 2024 •

edited

Loading

maleadt commented Sep 18, 2024 •

edited

Loading