Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsec_advise_data_on_device for zpotrf_L #118

Merged
merged 3 commits into from
Oct 25, 2024

Conversation

QingleiCao
Copy link

@QingleiCao QingleiCao commented Jun 7, 2024

These are the performance comparisons.

image

image

I changed to Lower in Cholesky testing and will create an issue to track why Upper behaves differently.

@QingleiCao QingleiCao requested a review from a team as a code owner June 7, 2024 13:39
src/zpotrf_L.jdf Outdated Show resolved Hide resolved
src/zpotrf_wrapper.c Outdated Show resolved Hide resolved
tests/testing_zpotrf.c Show resolved Hide resolved
@QingleiCao
Copy link
Author

I uploaded the performance results. BTW, I updated to the latest PaRSEC.

@bosilca
Copy link
Contributor

bosilca commented Jun 18, 2024

I understand the performance benefits but I have many concerns about the approach. We (or at least I) claimed for a long time that JDF algorithms were expressing the dataflow and were independent of other platform related constraints. This PR is clearly at odds with this statement, and I'm not sure we want to go that route.

@QingleiCao
Copy link
Author

As discussed, these changes will be moved to the wrapper to keep the JDF clean.

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 2 times, most recently from c67b4b5 to 6743a5a Compare September 5, 2024 02:03
@QingleiCao
Copy link
Author

How about adding this feature in parsec instead of in dplasma?

@bosilca
Copy link
Contributor

bosilca commented Sep 5, 2024

Can you give us a hint on how that would look in parsec ?

@QingleiCao
Copy link
Author

Can you give us a hint on how that would look in parsec ?

I'm thinking of providing this feature in PaRSEC like apply with several predefined functions (like operation in apply). Then, users can call that in the wrapper or other places, as this advice_device needs to be called only once (if my understanding is correct) and no change is needed in JDF

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 6 times, most recently from 252d978 to 726d4ab Compare September 27, 2024 17:24
Copy link
Contributor

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other programming paradigms call this a mapper. It would be nice if we reuse a concept that is familiar to [at least some] users.

src/dplasmaaux.c Outdated
#if defined(DPLASMA_HAVE_CUDA) || defined(DPLASMA_HAVE_HIP)

/* Find all devices */
void dplasma_find_nb_devices(int **dev_index, int *nb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a generic name suggesting it gets all devices when internally it only selects the accelerators. Please change the name, and please add also support for ZE.

src/dplasmaaux.c Outdated
if((*nb) == 0) {
char hostname[256];
gethostname(hostname, 256);
fprintf(stderr, "No CUDA device found on rank %d on %s\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have proper methods in parsec to output warnings. Please use parsec_warning instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think it is a good idea to return low level memory upstream, the ownership is unclear. As here we are talking about a very small amount of memory, I would suggest requiring the caller to provide an array of the size parsec_nb_devices into this call, and this function then does a single loop to fill this array up. All memory ownership is then on the caller size.

src/dplasmaaux.c Show resolved Hide resolved
tests/testing_zpotrf.c Show resolved Hide resolved
@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 2 times, most recently from 6813296 to 40a87d4 Compare October 5, 2024 12:11
@QingleiCao
Copy link
Author

QingleiCao commented Oct 5, 2024

@bosilca
Copy link
Contributor

bosilca commented Oct 7, 2024

There will be some memory leak as apply will free the arg by itself: https://github.com/ICLDisco/parsec/blob/00470145f3ce392853871c76d3200b7cb8aa6f05/parsec/data_dist/matrix/apply_wrapper.c#L115-L116

Please fix in parsec, there is absolutely no reason for parsec to free something it didn't allocate. Thus it is a bug not a feature.

@QingleiCao
Copy link
Author

There will be some memory leak as apply will free the arg by itself: https://github.com/ICLDisco/parsec/blob/00470145f3ce392853871c76d3200b7cb8aa6f05/parsec/data_dist/matrix/apply_wrapper.c#L115-L116

Please fix in parsec, there is absolutely no reason for parsec to free something it didn't allocate. Thus it is a bug not a feature.

Sure. I will create an issue and fix that in PaRSEC.

@abouteiller
Copy link
Contributor

ICLDisco/parsec#676 has been merged

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 4 times, most recently from 1314a0e to 25acbe2 Compare October 14, 2024 21:03
{
zplghe_args_t *params = (zplghe_args_t*)malloc(sizeof(zplghe_args_t));

params->bump = bump;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:

Keep the same dplasma_zplghe_New prototype and the malloc here too, and do the free in dplasma_zplghe_Destruct you can get it from tp->_g_op_args if you cast tp to a parsec_apply_taskpool_t *

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 3 times, most recently from bd37e51 to a5e1f57 Compare October 18, 2024 15:50
@abouteiller abouteiller merged commit 42f4773 into ICLDisco:master Oct 25, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants