-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-serial versions of tests using 5x5_amazon
failing RUN
#2423
Comments
5x5_amazon
failing run5x5_amazon
failing RUN
@ekluzek given the feedback from NCAR/mpibind#5 (comment), should I make an issue in the ccs_config_cesm repo? |
@glemieux yes go ahead and do that. |
During the ctsm stand-up meeting today we came up with the following actions for the time being:
It was also noted that this doesn't seem to be an issue for |
@glemieux note this also relates to another problem I ran into: where the new use of mpibind needed me to do something different for mksurfdata_esmf. |
The ccs_config issue is here: |
This will be reverted once issue ESCOMP#2423 has been addressed
Completed these actions items per #2436. |
It seems like the non-serial |
Actually, it would probably be worth checking whether the original test you noticed this with—the |
@samsrabin good question on the removal of the MPI version of this test. The utility of the MPI test is to check that MPI works for a simple regional grid. As a way to make sure small regional cases work with MPI in general. It also makes sure you can use MPI for a grid that's only a fraction of a node. Now at this point we also have the nldas2 grid that we test that's a larger regional grid so we could call that sufficient. The advantage here though is that 5x5 amazon is a simple, fast, small grid for testing. So I like the idea of keeping it for at least some of our testing, if not this specific test for fates seed dispersal. |
@glemieux will check on this to see if it's still an issue |
I can confirm that the original issue is resolved. I reinstated the Test case: |
Brief summary of bug
mpibind
seems to have an issue with5x5_amazon
resolutions when run with full mpi (i.e. noMPI-serial
) since ctsm5.1.dev173. Originally posted at NCAR/mpibind#5.General bug information
CTSM version you are using: ctsm5.1.dev173
Does this bug cause significantly incorrect results in the model's science? [Yes / No] Run fails so no assessment possible
Details of bug
This was discovered when running the
FatesColdSeedDispersal
test while generating new fates baselines for the dev173 update. I was able to also replicate this failure using a non-serial MPI version of thehillslope
clm-only test. The run immediately fails producing a cesm.log entry with a note about one of the core selections beinginvalid
(see below). It also produced an mpibind.log that I hadn't noticed before.This prompted me to compare dev172 and dev173 runs for non-serial MPI versions of the
hillslope
test that use5x5_amazon
. The dev172 version passes, but I noticed that thepreview_run
output is different:dev172:
dev173:
What is odd to me is that
mpibind
was brought in dev172 viaccs_config_cesm0.0.92
, so why is the call not activated for that tag? Why is it only being invoked with dev173?Important details of your setup / configuration so we can reproduce the bug
You can view the SRCROOT_GIT_STATUS files for both dev173 and dev172
hillslope
runs here, respectively:/glade/u/home/glemieux/scratch/ctsm-tests/tests_mpi-nonserial-check-clm_hillslope-dev173
/glade/u/home/glemieux/scratch/ctsm-tests/tests_mpi-nonserial-check-clm_hillslope
Important output or errors that show the problem
cesm.log
mpibind.log
The text was updated successfully, but these errors were encountered: