Simulation stopped after 5 model years with strange error message #742
Replies: 4 comments 2 replies
-
Hi Qinggang, |
Beta Was this translation helpful? Give feedback.
-
Hi Deniz Thanks a lot. I'll take this suggestion and make a warm start to solve it. Regards, Qinggang |
Beta Was this translation helpful? Give feedback.
-
Dear developers I just found that even I warm start another simulation, it will fail with the same error message. The experiment is installed here:
Could it be because I used too much storage on ollie? Allowed 3TB and used 20TB, I already tried to clean up a bit. But 3TB is far from enough, maybe I should request for more storage. Regards, Qinggang |
Beta Was this translation helpful? Give feedback.
-
Yes, it is expected. Now I cannot even run a simulation for one month. |
Beta Was this translation helpful? Give feedback.
-
Dear developers
May I ask a question about a failed simulation on ollie? I am using esm_tools to run ECHAM6 standalone simulation. I run it with yaml script here:
/work/ollie/qigao001/output/echam-6.3.05p2-wiso/pi/pi_m_411_4.9/run_20050101-20050131/scripts/pi_echam6_1m.yaml
. The simulation is installed here:/work/ollie/qigao001/output/echam-6.3.05p2-wiso/pi/pi_m_411_4.9
.It was successfully run for 5 year, but in the sixth year, it gives error message as shown here:
/work/ollie/qigao001/output/echam-6.3.05p2-wiso/pi/pi_m_411_4.9/run_20050101-20050131/log/pi_m_411_4.9_echam_compute_16562441.log
.The first few error messages are copied below:
Module for impi, version 2018.4.274 loaded.
216: slurmstepd: error: couldn't chdir to `/work/ollie/qigao001/output/echam-6.3.05p2-wiso/pi/pi_m_411_4.9/run_20050101-20050131/work': No such file or directory: going to /tmp instead
222: slurmstepd: error: execve(): /tmp/./echam6_0720_4.9: No such file or directory
216: slurmstepd: error: Unable to create TMPDIR [/work/ollie/tmp]: Permission denied
216: slurmstepd: error: Setting TMPDIR to /tmp
218: slurmstepd: error: execve(): /tmp/./echam6_0720_4.9: No such file or directory
I checked that the directory indeed exists. Can you please advise how to solve this issue?
Regards, Qinggang
Beta Was this translation helpful? Give feedback.
All reactions