Issue with restarting AWI-CM-1 runs with a different leg length #1126
Replies: 14 comments
-
Hi @mathanase , would it be possible for you to quickly do a test, if branching off and changing from daily to monthly restarts would work if you start on a 1st of the month? Only if this is easily doable for you. If this is causing no error, we could then maybe narrow it down to the start date. I will have a look on multi-days restarts. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Just to add. The multi-days issue is the same to #807. |
Beta Was this translation helpful? Give feedback.
-
@nwieters , Thanks for looking into this. It is actually not that straightforward for me to restart on the 1st of the month. If it is important, I can try taking some intermediate steps to make that possible. But that would require some additional short runs. |
Beta Was this translation helpful? Give feedback.
-
Hi @mathanase , One thing I tried so far is to run a simulation with 5days restarts with AWICM3 and the latest version of esm_tools. That works well. It also has no problems with runs that go over a month break (run_20000131-20000204). I will try this also with awicm-CMIP6. Which esm_tools version are you using (esm_tools --version)? I won't be here tomorrow but I will have a look again on Wednesday. |
Beta Was this translation helpful? Give feedback.
-
I am using esm tools version 6.21.5. |
Beta Was this translation helpful? Give feedback.
-
Hi @mathanase , (/work/ba1264/a270216/NRT/runtime_ens/awicm1_T127L95_PD_T20_24H_20231208_20240108/awicm1_T127L95_PD_T20_24H_20231208_20240108_E1/log/awicm1_T127L95_PD_T20_24H_20231208_20240108_E1_awicm_compute_20231208-20240107_8403867.log) it seems that it is actually starting the run correctly at 8.12.2023 but stops at the year break. In the ECHAM stderr file (/work/ba1264/a270216/NRT/runtime_ens/awicm1_T127L95_PD_T20_24H_20231208_20240108/awicm1_T127L95_PD_T20_24H_20231208_20240108_E1/run_20231208-20240107/work/echam.stderr) there is a line saying: Could the time out be related to some files for year 2024 that are not found? E.g. nudging? Do you know if this could be the case? |
Beta Was this translation helpful? Give feedback.
-
Hi @mathanase , I am doing a run right now that starts from 8.12.2023 with a monthly restart but without nudging. And it looks as if it is hanging at the same step as yours. So my guess it now that ECHAM does not like restarts that include or go over a year break. I will do some more tests just to be sure. |
Beta Was this translation helpful? Give feedback.
-
@nwieters I did find out that some nudging files were missing yesterday, but that did not resolve my own issues with monthly restarting from the middle of a month. |
Beta Was this translation helpful? Give feedback.
-
Hi @nwieters. Using the same runscript from Marylou, I tried to run a similar run between 02 October 2023 to 01 November 2023 (/work/ba1264/a270148/Test1MonthF0210To0111) and the issue is the same. (It stuck at 00UTC 01 November, so 1 day before ending). |
Beta Was this translation helpful? Give feedback.
-
Hi @antoniofis10000 , @mathanase , @mandresm , yes. I also get this behaviour without nudging. I tried monthly restarts and also restarts with 14 days but it is still hanging at the month break. For me it seems that, for some reason and in some circumstances, ECHAM interrupts at a month break resulting in a hanging run. I will do some tests with ECHAM standalone to find out, whether a namelist entry is still not correctly set. |
Beta Was this translation helpful? Give feedback.
-
Hi @nwieters |
Beta Was this translation helpful? Give feedback.
-
Hi @mathanase , @antoniofis10000 , as we already discussed, I did a few test runs with ECHAM standalone (version echam-6.3.04p1 as in awicm-CMIP6).
To 1.: I am not really sure why, but it seems to me, that you need to initialize or restart always at the beginning of a restart unit (i.e. at the beginning of the day, month, or year), otherwise Echam will interrupt at the end of the chosen unit. To 2.: It is possible to change the restart rate and unit, but there are some constraints to be considered:
In a coupled setup like awicm-CMIP6, the interruption of Echam will result in Fesom hanging while waiting for coupling data and eventually result in a time out. I hope my explanation was clear to you. If not, please let me know or ask me directly. In order to get more detailed information on Echam rerun events, you can turn on a debugging flag for this:
You will than find more information about Echam events in the echam6.log or echam.stderr file in your experiment (.../run_from-to/work/echam6.log, search for 'State of event <rerun interval>'). |
Beta Was this translation helpful? Give feedback.
-
Hi, I will for now turn this into a discussion. Please add more comments or raise another question if needed. |
Beta Was this translation helpful? Give feedback.
-
Hi @nwieters. Thanks for the really detailed explanation. So I tried to run a 5-day experiment (from 00UTC 04/10/2023 to 00UTC 09/10/2023, /work/ba1264/a270148/Extension5DaysC), and it worked. Then, I am shocked as both 5 are not multiple of 3, and 09/10/ is not in the sequence of the original experiments. In addition, I tried a 3-day experiment (from 00UTC 04/10/2023 to 00UTC 07/10/2023, /work/ba1264/a270148/Extension3Days/, so the same as the "original length in Mistral" but changing months to days) and it also crashes (Interrupted at 00UTC 06/10/2023). Is this behaviour expected? Probably dedicating more time to this topic is useless, but to notify this observed behaviour. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
I have recently been trying to branch-off a simulation from a parent AWI-CM-1 run (on DKRZ Levante). The parent-run has a restart length of 1 day. For the simulations I am branching off, I was hoping to perform multi-days or 1-monthly leg lengths.
However, every single attempt ended with seemingly one component waiting for the other forever, and therefore the simulation hit the time limit.
See one example test runscript in:
/home/a/a270216/NRT/ERA5T_ENSEMBLE/runscripts/tests_extension
See also one example of failed experiment in:
/work/ba1264/a270216/NRT/runtime_ens/awicm1_T127L95_PD_T20_24H_20231208_20240108/awicm1_T127L95_PD_T20_24H_20231208_20240108_E1/
I suspect that doing a 1-month long leg starting not on the 1st of the month might be an issue too. Nevertheless, the multi-days option also does not work, so I am unsure what to do in this case.
I also may be doing something wrong here, but cannot figure out why.
If not, then it would be great to be able to change the restart length (from short to long) without issues. Note that changing from multi-days / multi-months length to daily was not an issue in the past.
FYI, @antoniofis10000 reported having faced similar issues in the past.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions