-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staged cable improve resubmission and restart handling #537
base: master
Are you sure you want to change the base?
Staged cable improve resubmission and restart handling #537
Conversation
Now only build the config log during setup, as I learned that expt.n_runs can be set to control resubmission rather than setting the environment variable directly. During archiving, any restarts from the previous stage not updated by the current stage are copied to the next set of restarts, so all that is required for portable restarting is a single restart directory.
Original implementation searched for older restarts on setup, now we copy any non-duplicate restarts into the next stage restarts. This allows a single restart to begin any configuration.
The tests required some changes to the configuration and stage config paths. Some minor changes to the tests to account for slightly changed behaviours.
I've made a team for ACCESS-NRI and this is rectified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, but I've not looked at the logic deeply, just had a skim and commented on things that stood out to me.
shutil.copy(os.path.join(prior_restart_path, f), | ||
self.work_restart_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to just make links, which will resolve to their original location IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the scenario where we want to share a restart directory with someone else, or perhaps even publicly somewhere, so that they can use it as a start point for simulations? I think it's likely enough that the original run would be somewhere that others won't have read permissions to, or even potentially on other machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into the code changes but I don't see how making a copy solves problems of read permissions? You need read access to be able to copy files. So links would just be as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of a situation where someone (e.g. ACCESS-NRI) has run an expensive spin-up configuration and want to provide the restarts as a starting point for others in the community. The configuration may have 20+ stages, but the full suite of restarts may contain a restart from the 1st stage, one from the 5th stage and the rest from the most recent stage. A supported payu
workflow is to take a restart from elsewhere and effectively plug it in, so payu
transparently restarts from this restart (I think this is true- I have memory of this being demonstrated on the training day).
A scenario where I run a configuration to create a starting point for community science. I'll likely save the data to my non-permanent scratch space. I want to take the most recent payu
set of restarts and store it someone accessible on gdata
. If the necessary older restarts are only known by symlink, then they'll disappear relatively soon and whichever user takes this payu
restart may not have access to my scratch space where the symlinked files live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use cp
with the -L
option to "dereference" links, so the destination will contain all the files referenced by links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks ok to me.
I had to go read the whole file to remind myself of what was going on here, so I have a comment on something you haven't changed. Feel free to ignore.
In this block of code:
payu/payu/models/staged_cable.py
Lines 178 to 188 in b69b26d
for namelist in namelists: | |
write_target = os.path.join(self.work_input_path, namelist) | |
stage_nml = os.path.join(self.control_path, stage_name, namelist) | |
if os.path.isfile(os.path.join(self.control_path, namelist)): | |
# Instance where there is a master and stage namelist | |
with open(stage_nml) as stage_nml_f: | |
stage_namelist = f90nml.read(stage_nml_f) | |
master_nml = os.path.join(self.control_path, namelist) | |
f90nml.patch(master_nml, stage_namelist, write_target) |
I think it would be more readable if you define master_nml
at the same place as stage_nml
and use it in the IF condition on file existence (l.182):
if os.path.isfile(master_nml):
Improvements to
staged_cable
driverI made some improvements to the
staged_cable
driver to hopefully make it more easily understandable, and more easily shareable. Summary of changes:climate_rst.nc
andcable_rst.nc
, which live in{archive}/restart000
. We run stage 2, which created restartscasa_rst.nc
andcable_nc.nc
. We look back at{archive}/restart000
, see that there's no new version ofclimate_rst.nc
, so copy that to the work restart directory. The contents of the work restart directory are then moved to{archive}/restart001
. This makes restarts much more easily shareable.expt.n_runs
instead of the environment variable, which is persistent. Now the configuration log work is only done where necessary insetup()
and archive```, rather than every initialisation.Coming with this is the removal of a fair bit of now-dead code.
Edit: One thing I forgot to mention is the handling of potentially new namelists. I would like it to be possible for people adding new science (or perhaps
cable
configurations I don't know about yet) to be able to use the tool. Unfortunately, I can't know a priori which namelists will be required in these instances. To get around this, I could simply setrather than the predefined list of namelists, but it feels like a bit of an abuse of process. Thoughts?