Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove SMP_PRESENT and replace with BUILD_THREADED #4546

Merged
merged 14 commits into from
Dec 21, 2023

Conversation

jedwards4b
Copy link
Contributor

SMP_PRESENT and BUILD_THREADED were both used in the same way and having two variables to do the same thing is confusing at best. This replaces all instances of SMP_PRESENT with BUILD_THREADED. Note that this may require changes in E3SM outside of CIME.

Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #4544
User interface changes?: xml variable SMP_PRESENT is removed.

Update gh-pages html (Y/N)?:

@jedwards4b
Copy link
Contributor Author

system-testing expected to fail

@rljacob rljacob requested a review from amametjanov December 15, 2023 01:50
Copy link
Contributor

@jgfouca jgfouca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is a good change.

@amametjanov
Copy link
Member

@jgfouca, will there be a follow-up PR to fix things in E3SM?
Just wanted to check that BUILD_THREADED was set F->T->F with NTHREADS changing 1->2->1: noted in #2466 (comment) .

azamat@perlmutter:login09:~/saul/E3SM
> cd cime && git branch && cd -
  master
* replace_smp_present
azamat@perlmutter:login09:~/saul/E3SM
> ./cime/scripts/create_test SMS.f19_g16.X --no-build -t 20231218-chk-cime
Using project from .cesm_proj: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 320 cores simultaneously
Creating test directory /pscratch/sd/a/azamat/e3sm_scratch/pm-cpu/SMS.f19_g16.X.pm-cpu_intel.20231218-chk-cime
RUNNING TESTS:
  SMS.f19_g16.X.pm-cpu_intel
Starting CREATE_NEWCASE for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished CREATE_NEWCASE for test SMS.f19_g16.X.pm-cpu_intel in 5.232196 seconds (PASS)
Starting XML for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished XML for test SMS.f19_g16.X.pm-cpu_intel in 0.363084 seconds (PASS)
Starting SETUP for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished SETUP for test SMS.f19_g16.X.pm-cpu_intel in 1.173132 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /pscratch/sd/a/azamat/e3sm_scratch/pm-cpu/SMS.f19_g16.X.pm-cpu_intel.20231218-chk-cime
    Errors were:
        ERROR: No variable BUILD_THREADED found in case

@jedwards4b
Copy link
Contributor Author

@amametjanov you need to update mct/cime_config/config_component.xml

Copy link
Member

@amametjanov amametjanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked in E3SM with SMS.f19_g16.X.pm-cpu_intel and

azamat@perlmutter:login09:~/saul/E3SM
> git diff driver-mct/cime_config/config_component.xml
diff --git a/driver-mct/cime_config/config_component.xml b/driver-mct/cime_config/config_component.xml
index 7c8e83fa70..ea1de9ee8d 100644
--- a/driver-mct/cime_config/config_component.xml
+++ b/driver-mct/cime_config/config_component.xml
@@ -717,7 +717,7 @@
       If TRUE, the component libraries are always built with OpenMP capability.</desc>
   </entry>

-  <entry id="SMP_PRESENT">
+  <entry id="BUILD_THREADED">
     <type>logical</type>
     <valid_values>TRUE,FALSE</valid_values>
     <default_value>FALSE</default_value>
  • Changing NTHRDS_ATM 1->2->1 or FORCE_BUILD_SMP F->T->F switches BUILD_THREADED F->T->F.
  • Need to do the rename during the merge to E3SM.

@jedwards4b
Copy link
Contributor Author

@amametjanov thank you for confirming.

@jedwards4b
Copy link
Contributor Author

@jasonb5 do i need to fix something here or is the ball in your court?

@jasonb5
Copy link
Collaborator

jasonb5 commented Dec 19, 2023

@jedwards4b It's in mine, I'll restart the tests shortly.

@jasonb5
Copy link
Collaborator

jasonb5 commented Dec 19, 2023

@jedwards4b ccs_config needs to be updated in the CESM repo.

@jedwards4b
Copy link
Contributor Author

@jasonb5 The file Externals.cfg that is used by docker needs to be updated.

@jasonb5
Copy link
Collaborator

jasonb5 commented Dec 20, 2023

@jedwards4b I updated the container to use Externals.cfg from the CIME repo for both sys and unit tests. The remaining errors look related to CDEPS.

SMS.f19_g16_rx1.A.docker_gnu.fake_testing_only_20231220_171621/bld/gnu/openmpi/nodebug/nothreads/nuopc/CDEPS', error=/__w/cime/cime/cime/components/cdeps/streams/dshr_stream_mod.F90:1714:28:

 1714 |     use shr_file_mod, only : shr_file_get_real_path
      |                            1
Error: Symbol 'shr_file_get_real_path' referenced at (1) not found in module 'shr_file_mod'
make[2]: *** [streams/CMakeFiles/streams.dir/build.make:101: streams/CMakeFiles/streams.dir/dshr_stream_mod.F90.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:375: streams/CMakeFiles/streams.dir/all] Error 2
make: *** [Makefile:124: all] Error 2

@jedwards4b
Copy link
Contributor Author

I've been trying to figure that out.

 git blame shr_file_mod.F90 | grep shr_file_get_ 
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600   64)   public :: shr_file_get_real_path ! Get a fully resolved path
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600 1010)   subroutine shr_file_get_real_path(path, resolved_path)
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600 1043)   end subroutine shr_file_get_real_path

@jedwards4b
Copy link
Contributor Author

Ah - the share should be share1.0.18 - we had it set to 17

@jedwards4b
Copy link
Contributor Author

The e3sm test is expected to fail because the mct driver config_compset.xml file needs to be changed to use BUILD_THREADED instead of SMP_PRESENT.

@jedwards4b jedwards4b merged commit 7e9df5f into ESMCI:master Dec 21, 2023
5 of 6 checks passed
@jedwards4b jedwards4b deleted the replace_smp_present branch December 21, 2023 14:08
jgfouca added a commit that referenced this pull request Jan 3, 2024
It's removal in #4546 seems to have broken a number of
less-common test types (ERR, MCC, IRT, PRE).
@@ -219,8 +219,6 @@ def create_clone(
)
)

newcase.case_setup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this have anything to do with the SMP_PRESENT to BUILD_THREADED change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was something I caught as being out of place during the development of that change, so it's not directly related to that but it was probably included in that PR. If you want to put it back and then open an issue to fix the problem by fixing the tests then I guess we can do it that way...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'd like to do it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have a solution now that fixes the tests and leaves this out - are you okay with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

jgfouca added a commit that referenced this pull request Jan 3, 2024
It's removal in #4546 seems to have broken a number of less-common test types (ERR, MCC, IRT, PRE).

MCC_P1.f19_g16_rx1.A
ERROR: ERROR: must invoke case.setup script before calling build script
ERR_Ln9.f45_g37_rx1.A
Exception from case_test: [Errno 2] No such file or directory: '/scratch/jenkins-slave/workspace/CIME_DEVELOPER_TESTS_MASTER/archive/ERR_Ln9.f45_g37_rx1.A.anlgce_gnu.20240102_114925_1cza5r/rest'
Exception from case_test: [Errno 20] Not a directory: '/scratch/jenkins-slave/workspace/CIME_DEVELOPER_TESTS_MASTER/ERR_Ln9.f45_g37_rx1.A.anlgce_gnu.20240102_114925_1cza5r/run/case2run/timing'
IRT_N2_Vmct_Ln9.f19_g16_rx1.A
Exception from case_test: [Errno 20] Not a directory: '/scratch/jenkins-slave/workspace/CIME_DEVELOPER_TESTS_MASTER/IRT_N2_Vmct_Ln9.f19_g16_rx1.A.anlgce_gnu.20240102_114925_1cza5r/run/case2run/timing'
PRE.f19_f19.ADESP
Exception from case_test: ERROR: Do not know about batch job case.test
Test suite: the test cases above, by hand
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes [CIME Github issue #]

User interface changes?:

Update gh-pages html (Y/N)?:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUILD_THREADED or SMP_PRESENT
5 participants