Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-repro, simple CI checks all relevant config PRs #25

Closed
CodeGat opened this issue Mar 5, 2024 · 19 comments · Fixed by #32
Closed

Add non-repro, simple CI checks all relevant config PRs #25

CodeGat opened this issue Mar 5, 2024 · 19 comments · Fixed by #32

Comments

@CodeGat
Copy link
Member

CodeGat commented Mar 5, 2024

References this comment: #23 (comment)

We don't need to go through the rigamarole of testing reproducibility for all PRs (only test that for dev-* into release-*) but we should still do some checks. See the referenced comment for the types of checks we want to do. These would run on all config-related PRs (<some feature> into dev-*, and dev-* into release-*).

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Mar 19, 2024

Some checks

  • check project is not defined
  • check shortpath is not defined
  • absolute input paths
  • all input paths in vk83 (allow exceptions for jra55 data in IAF configs)
  • all input paths point to files (allow exceptions for IAF forcing data)
  • exe paths absolute. test for consistency of spack hash?
  • mppnccombine-fast collate exe for high res configs
  • test no storage flags in qsub flags
  • check no scripts in top level directory (should be in tools)
  • test runlog is on
  • restart_freq is date based. Maybe define some acceptable values?
  • sync not enabled, sync path not set
  • metadata.yaml json schema validation
  • metadata.yaml fields present and correct (description, notes, keywords, nominal resolution, version, reference, license, url, model, realm = ['ocean', 'seaIce'] or realm = ['ocean', 'seaIce', 'ocnBgchem']?)
  • manifests updated and consistent with executables and inputs

Many of these are available in an existing CI checking "library"

https://github.com/COSIMA/cleanconfig/blob/master/test.py

@aidanheerdegen
Copy link
Member

Could use pytest markers to mark tests to assist in running only when required. e.g. highres marker could be used to switch on/off the mppnccombine-fast check

https://pytest-with-eric.com/pytest-best-practices/pytest-markers/

@aidanheerdegen
Copy link
Member

@dougiesquire do you have any comments about the checks suggested above? Particularly the metadata.yaml fields? Should access-om2-bgc have an ocean and ocnBgchem realm?

And off topic, but hooo boy, there should be a special place in controlled vocabulary hell for the committee that thought ocnBgchem was better than oceanbgc or oceanbiogeochem or pretty much anything. Eeeuuwww.

@dougiesquire
Copy link
Collaborator

Should access-om2-bgc have an ocean and ocnBgchem realm?

Seems sensible to me. FYI, this choice doesn't impact the catalog, which gets the realm per file. Currently there's no easy way to determine whether an ACCESS-OM2 MOM output file is ocean or ocnBgchem so everything is classified as ocean.

@aidanheerdegen
Copy link
Member

Thanks.

I should have been more specific (as usual), I wasn't sure if ocnBgchem (feels awful just typing it) implied ocean.

If we assume not, that both are required, we'll have the situation that the experiment specifies both realms, but none of the "file assets" (data files) have the ocnBgchem realm. I think that's fine, but we'll probably be first up against the wall if we stumble into an angry mob of ontologists (is there any other kind?).

@dougiesquire
Copy link
Collaborator

I wasn't sure if ocnBgchem (feels awful just typing it) implied ocean

I see - I don't think we should only include ocnBgchem. I think what's in the realm property of the metadata.yaml file should be the union of the realms for all the files in the experiment. access-om2-bgc has seaIce (CICE output) and ocean (most of MOM output) and ocnBgchem (MOM WOMBAT output).

we'll have the situation that the experiment specifies both realms, but none of the "file assets" (data files) have the ocnBgchem realm.

I can't see this being a huge problem anytime soon. The only issue I can think of off the top of my head is if you are doing/planning something like writing code snippets to load data from the catalog based on the metadata.yaml file. Then its probably safest to just have realm = ['ocean', 'seaIce'].

@CodeGat
Copy link
Member Author

CodeGat commented Mar 25, 2024

Could use pytest markers to mark tests to assist in running only when required. e.g. highres marker could be used to switch on/off the mppnccombine-fast check

https://pytest-with-eric.com/pytest-best-practices/pytest-markers/

@aidanheerdegen if we want to run different checks on different branches, we could add another .json file to the config folder on main that informs the CI (I seem to love those) regarding what markers to use. For example:

{
  "markers": {
    "branches": {
      "release-1deg_jra55_ryf": "highres and config and metadata",
      "release-some_other_config": "config and metadata"
    }
  }
}

What do you think? (thanks for bringing it to my attention @jo-basevi

@aidanheerdegen
Copy link
Member

if we want to run different checks on different branches, we could add another .json file to the config folder on main that informs the CI (I seem to love those) regarding what markers to use

We could .. but equally we could also just interrogate the branch name

#32 (comment)

If the latter makes the CI too access-om2 specific, if otherwise it could be used more generically, then I'd be ok with a config file.

@CodeGat
Copy link
Member Author

CodeGat commented Mar 26, 2024

We could interrogate the branch, and do it all in a step, but that assumes that we will not really have many branches (if it's this branch or this one or ... then use these markers), or there is a certain pattern to branches that makes them all definitely config or metadata or highres.
I've done up the config file in this PR, anyway - what do you think?

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Mar 27, 2024

I forgot a check! We also want this set in config.yaml

manifest:
   reproduce:
      exe: True

Arguably we probably want input: True as well, but still chewing over the implications for users.

@aidanheerdegen
Copy link
Member

Another thought popped into my head because I had already made this mistake

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-1deg_jra55_ryf/accessom2.nml#L19

It is common practice to reduce the restart_period (model run time) when testing a configuration, so we should check it is a "reasonable" value.

Now .. what is reasonable?

The one degree has a 5 year run time for both physics only and BGC:

https://github.com/COSIMA/1deg_jra55_ryf/blob/master/accessom2.nml#L19
https://github.com/COSIMA/1deg_jra55_ryf/blob/master%2Bbgc/accessom2.nml#L19

The quarter degree typically runs for 2 years at a time

https://github.com/COSIMA/025deg_jra55_ryf/blob/master/accessom2.nml#L20

but the BGC version is 1 year

https://github.com/COSIMA/025deg_jra55_ryf/blob/master%2Bbgc/accessom2.nml#L20

The tenth degree only runs 3 months at a time

https://github.com/COSIMA/01deg_jra55_ryf/blob/master/accessom2.nml#L22

and the BGC version is 1 month

https://github.com/COSIMA/01deg_jra55_ryf/blob/master%2Bbgc/accessom2.nml#L22

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Mar 27, 2024

It would be good to control (limit) the acceptable keywords in the configurations, partly to provide guidance on what should be used because everyone will come up with their own.

Keywords are a bit of a wild-west and it doesn't seem there are many definitive vocabularies of keywords available, but the Global Change Master Directory (GCMD) Keywords might be one. It might be overkill but it is a great resource, e.g.

https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/all/10a128a6-12d4-4bce-b25d-2ffc464182f4?gtm_keyword=POLYNYAS&gtm_scheme=Earth%20Science

So .. a list of acceptable keywords (please others add your opinions):

Topic Keywords (mutually exclusive)
Spatial extent global, regional
Forcing product JRA55, ERA5
Forcing mode repeat-year, ryf, repeat-decade, rdf, interannual, iaf
Model access-om2, access-om2-025, access-om2-01

I've intentionally left off organisational keywords because that should be covered in the metadata right? Also it might not be mutually exclusive. We should definitely credit COSIMA because they developed these configurations, but also ACCESS-NRI is supporting them now.

@aidanheerdegen
Copy link
Member

Speaking of controlled vocabularies, we need to populate the nominal resolution field in the metadata.yaml with something useful.

From this CMIP6 example it seems like a distance measure is the normal approach and the respective values should be

Config resolution Nominal Resolution
1deg 100 km
025deg 25 km
01deg 10 km

Does that look right to you @aekiss ?

@dougiesquire
Copy link
Collaborator

dougiesquire commented Mar 28, 2024

Note the CMIP6 controlled vocab is here. Most are distance measures as you said.

@aekiss
Copy link
Contributor

aekiss commented Mar 28, 2024

I think that's close enough and would be familiar to people. Of course it depends on latitude etc. eg see table 6 and fig 5 of the tech report
Screenshot 2024-03-28 at 2 48 59 pm
Screenshot 2024-03-28 at 2 50 05 pm

@aidanheerdegen
Copy link
Member

Note the CMIP6 controlled vocab is here. Most are distance measures as you said.

Thanks. WCRP "Controlled" vocabulary I have a 1x1 degree bone to pick with you.

https://github.com/WCRP-CMIP/CMIP6_CVs/blob/main/CMIP6_nominal_resolution.json#L9

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Mar 28, 2024

I think that's close enough and would be familiar to people. Of course it depends on latitude etc. eg see table 6 and fig 5 of the tech report

Absolute distance is a terrible measure eh? "ballpark resolution" might be a better name.

Should we use 1x1 degree, 0.25x0.25 degree and 0.1x0.1 degree?

We don't have to follow CMIP6.

Ignore me. I have already changed the CI goalposts too many times today. So sorry @jo-basevi!

@aekiss
Copy link
Contributor

aekiss commented Mar 28, 2024

Absolute distance is worse than degrees, but degrees are also not great (meridional resolution in degrees is also latitude-dependent due to Mercator grid, and then there's the tripolar region). I guess that's why they hedge it with "nominal".

@aidanheerdegen
Copy link
Member

And at least distance has some vague meaning for normal people I guess.

aidanheerdegen added a commit that referenced this issue Apr 8, 2024
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25
* Removed a couple of non-standard keywords
* Updates based on QA failures
* Updated to access-om2 2024.03.0
* Update exe manifest
* Added CC BY 4.0 License file
* Updated metadata notes to be consistent with #64

---------

Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
aidanheerdegen added a commit that referenced this issue Apr 8, 2024
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25
* - Changed to access-om2 to 2024.03.0
* Turn on exe manifest repro check
* Added CC BY 4.0 License file
* Fix jobname in config.yaml
* Update notes metadata to be consistent with #64
* Removed extraneous metadata fields
* Set version to 1.0

---------

Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
aidanheerdegen added a commit that referenced this issue Apr 9, 2024
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25
* Removed a couple of non-standard keywords
* Updates based on QA failures
* Updated to access-om2 2024.03.0
* Update exe manifest
* Added CC BY 4.0 License file
* Updated metadata notes to be consistent with #64

---------

Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
CodeGat added a commit that referenced this issue Apr 9, 2024
* Fix 1deg_jra55_ryf configuration (#54)

* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25
* Removed a couple of non-standard keywords
* Updates based on QA failures
* Updated to access-om2 2024.03.0
* Update exe manifest
* Added CC BY 4.0 License file
* Updated metadata notes to be consistent with #64

---------

Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>

* Set version to 1.0

* Updated checksums and bumped version to 2.0 as part of https://github.com/ACCESS-NRI/access-om2-configs/actions/runs/8611230342

---------

Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
aidanheerdegen added a commit that referenced this issue Apr 10, 2024
* Fix 1deg_jra55_iaf configuration (#59)
* Changed to access-om2 to 2024.03.0
* Added CC BY 4.0 License file
* Update notes metadata to be consistent with #64
* Updated checksums and bumped version to 2.0 as part of https://github.com/ACCESS-NRI/access-om2-configs/actions/runs/8625604000
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25

---------
Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
aidanheerdegen pushed a commit that referenced this issue Apr 10, 2024
* Update update-01deg_jra55_iaf_bgc. Changed to access-om2 to 2024.03.0
* update config.yaml
* update metadata.yaml
* Added CC BY 4.0 License file
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25

---------

Co-authored-by: minghangli <minghang.li1@anu.edu.au>
CodeGat pushed a commit that referenced this issue May 13, 2024
* Fix 1deg_jra55_iaf configuration (#59)
* Changed to access-om2 to 2024.03.0
* Added CC BY 4.0 License file
* Update notes metadata to be consistent with #64
* Updated checksums and bumped version to 2.0 as part of https://github.com/ACCESS-NRI/access-om2-configs/actions/runs/8625604000
* call-pr-1-ci.yml: Removed 'metadata.yaml' from paths-ignore as part of #25

---------
Co-authored-by: Tommy Gatti <tommy.gatti@anu.edu.au>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants