Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove local copies from Schedule. #11

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

drwells
Copy link
Member

@drwells drwells commented Jul 17, 2023

This causes a couple of tests to fail for interesting reasons, notably

    IBTK/marker_points_02_2d.timestep.h5part.mpirun=2.input
    IBTK/marker_points_02_2d.timestep.h5part.input
    navier_stokes/navier_stokes_01_2d.input

The navier-stokes test is almost passing (the errors are near the tolerance of numdiff) but the first two are way off - not sure why. Perhaps we are copying things in a bad order now.

@boyceg
Copy link
Contributor

boyceg commented Jul 17, 2023

Is there an accumulate version of the receive operation? This is what I forgot to handle in my initial version of #10.

@drwells
Copy link
Member Author

drwells commented Jul 17, 2023

performLocalCopies() should be the same as doing packStream()/unpackStream() but it isn't - perhaps that's what is missing. I'll take a look.

The marker point tests (with this patch) get completely wrong values:

< X = 0.0147398, 0.081608 V = 0.542664, 0.220586
< X = 0.0147398, 0.193867 V = 0.542664, 0.180123
< X = 0.0147398, 0.306125 V = 0.542664, 0.139661
< X = 0.0147398, 0.418384 V = 0.542664, 0.0991993
< X = 0.0288938, 0.515868 V = 0.194435, -0.00646765
< X = 0.0147398, 0.642901 V = 0.542664, 0.0182751
< X = 0.0147398, 0.75516 V = 0.542664, -0.022187
< X = 0.0147398, 0.867418 V = 0.542664, -0.0626491
< X = 0.019355, 0.974035 V = 0.396029, -0.182672
---
> X = 0.0325043, 0.0784706 V = 0.138241, 0.210765
> X = 0.0325043, 0.185667 V = 0.138241, 0.157167
> X = 0.0325043, 0.292863 V = 0.138241, 0.103569
> X = 0.0325043, 0.400059 V = 0.138241, 0.0499707
> X = 0.0325043, 0.507255 V = 0.138241, -0.00362728
> X = 0.0325043, 0.614451 V = 0.138241, -0.0572253
> X = 0.0325043, 0.721647 V = 0.138241, -0.110823
> X = 0.0325043, 0.828842 V = 0.138241, -0.164421
> X = 0.0325043, 0.936038 V = 0.138241, -0.218019

The x-component of the velocity is constant so each column of points should stay aligned.

@drwells
Copy link
Member Author

drwells commented Aug 14, 2023

I'm out of time to investigate this today but I think the bug might be the other way around - either the local copy code is wrong or there is a bug in the marker point code.

If I run that test executable with this patch then the output is the same regardless of the number of processors, However:

[drwells@mitral IBTK]$ diff marker_points_02_2d.timestep.h5part.output  marker_points_02_2d.timestep.h5part.mpirun\=2.output                                                                                                      
79,86c79,86
< X = 0.956775, 0.0845583 V = 0.956775, 0.292279
< X = 0.956775, 0.20147 V = 0.956775, 0.350735
< X = 0.956775, 0.318382 V = 0.956775, 0.409191
< X = 0.956775, 0.435293 V = 0.956775, 0.467647
< X = 0.956775, 0.552205 V = 0.956775, 0.526102
< X = 0.956775, 0.669117 V = 0.956775, 0.584558
< X = 0.956775, 0.786028 V = 0.956775, 0.643014
< X = 0.956775, 0.90294 V = 0.956775, 0.70147
---
> X = 0.956775, 0.0845583 V = 0.949955, 0.292279
> X = 0.956775, 0.20147 V = 0.949955, 0.350735
> X = 0.956775, 0.318382 V = 0.949955, 0.409191
> X = 0.956775, 0.435293 V = 0.949955, 0.467647
> X = 0.956775, 0.552205 V = 0.949955, 0.526102
> X = 0.956775, 0.669117 V = 0.949955, 0.584558
> X = 0.956775, 0.786028 V = 0.949955, 0.643014
> X = 0.956775, 0.90294 V = 0.949955, 0.70147
89,97c89,97
< X = 0.0325043, 0.0784706 V = 0.138241, 0.210765
< X = 0.0325043, 0.185667 V = 0.138241, 0.157167
< X = 0.0325043, 0.292863 V = 0.138241, 0.103569
< X = 0.0325043, 0.400059 V = 0.138241, 0.0499707
< X = 0.0325043, 0.507255 V = 0.138241, -0.00362728
< X = 0.0325043, 0.614451 V = 0.138241, -0.0572253
< X = 0.0325043, 0.721647 V = 0.138241, -0.110823
< X = 0.0325043, 0.828842 V = 0.138241, -0.164421
< X = 0.0325043, 0.936038 V = 0.138241, -0.218019
---
> X = 0.0147398, 0.081608 V = 0.542664, 0.220586
> X = 0.0147398, 0.193867 V = 0.542664, 0.180123
> X = 0.0147398, 0.306125 V = 0.542664, 0.139661
> X = 0.0147398, 0.418384 V = 0.542664, 0.0991993
> X = 0.0147398, 0.530642 V = 0.542664, 0.0587372
> X = 0.0147398, 0.642901 V = 0.542664, 0.0182751
> X = 0.0147398, 0.75516 V = 0.542664, -0.022187
> X = 0.0147398, 0.867418 V = 0.542664, -0.0626491
> X = 0.019355, 0.974035 V = 0.396029, -0.182672
99,108c99,108
< X = 0.0420723, 0.0736804 V = 0.053882, 0.21316
< X = 0.0420729, 0.168974 V = 0.0538795, 0.165513
< X = 0.0420729, 0.264267 V = 0.0538795, 0.117866
< X = 0.0420729, 0.359561 V = 0.0538795, 0.0702196
< X = 0.0420729, 0.454854 V = 0.0538795, 0.0225729
< X = 0.0420729, 0.550147 V = 0.0538795, -0.0250737
< X = 0.0420729, 0.645441 V = 0.0538795, -0.0727204
< X = 0.0420729, 0.740734 V = 0.0538795, -0.120367
< X = 0.0420729, 0.836028 V = 0.0538795, -0.168014
< X = 0.04203, 0.93117 V = 0.0540485, -0.215585
---
> X = 0.0381908, 0.0751988 V = 0.0768031, 0.212401
> X = 0.0381723, 0.1749 V = 0.0769495, 0.16255
> X = 0.0381723, 0.274575 V = 0.0769495, 0.112713
> X = 0.0381723, 0.37425 V = 0.0769495, 0.0628752
> X = 0.0381723, 0.473924 V = 0.0769495, 0.0130378
> X = 0.0381723, 0.573599 V = 0.0769495, -0.0367996
> X = 0.0381723, 0.673274 V = 0.0769495, -0.086637
> X = 0.0381723, 0.772949 V = 0.0769495, -0.136474
> X = 0.0381723, 0.872623 V = 0.0769495, -0.186312
> X = 0.0380256, 0.957052 V = 0.0781211, -0.228526

so the results are way off between these on master, which seems wrong. The mpirun=2 case nearly agrees with this patch (one point is off) and at 4 MPI processes they agree. Using 1, 2, or 4 MPI processes produces the same output with this patch.

@drwells
Copy link
Member Author

drwells commented Aug 16, 2023

Those tests fail because we are inconsistently filling things across periodic boundaries. Using a solution which is actually continuous and periodic fixes the problem.

@drwells
Copy link
Member Author

drwells commented Aug 16, 2023

The next set of failing tests is

    IBTK/nodal_interpolation_01_3d.cf.z.input
    IBTK/nodal_interpolation_01_3d.cf.input
    IBTK/nodal_interpolation_01_3d.cf.x.input
    navier_stokes/navier_stokes_01_2d.input

@boyceg
Copy link
Contributor

boyceg commented Aug 16, 2023

It should be OK to have an actual solution discontinuity at the periodic boundary... the result should still be well defined.

@boyceg
Copy link
Contributor

boyceg commented Aug 16, 2023

With side-centered data, there is the question of "which solution wins" at periodic boundaries --- maybe that tie breaking is not being applied consistently?

@drwells
Copy link
Member Author

drwells commented Aug 16, 2023

I suspect this is a bug somewhere in SAMRAI, then, because we get consistent results (across different numbers of processors) when I make the solution continuous and inconsistent results when the solution is discontinuous. That's independent of this patch.

@boyceg
Copy link
Contributor

boyceg commented Aug 16, 2023 via email

@drwells
Copy link
Member Author

drwells commented Aug 16, 2023

I'll have to do some more experiments to know for sure, but I believe the answer is yes. If I fix all patches to be 4x4 then things are always consistent in parallel (i.e., this may be a literal corner-case)

@boyceg
Copy link
Contributor

boyceg commented Aug 17, 2023

requiring at least 2 patches in each periodic direction seems like it might be the easiest thing to do.

@boyceg
Copy link
Contributor

boyceg commented Aug 17, 2023

I bet that SAMRAI is not copying the periodically shifted copy of the patch onto itself.

@drwells
Copy link
Member Author

drwells commented Aug 17, 2023

Probably - that might explain why using more patches (which happens when I use more MPI processes) produces different results.

To be fair, we're working on banning AMR across periodic boundaries in deal.II because periodic BCs are a constant source of bugs.

@drwells
Copy link
Member Author

drwells commented Aug 17, 2023

The nodal interpolation tests don't use periodic boundaries. I'd bet there are more nodal bugs there but I'll have to do some more investigating to figure that out for sure.

This patch is revealing a bunch of interesting SAMRAI bugs - that's a second benefit to removing this subsystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants