Serial restart files for file systems that prohibit parallel IO #438
Unanswered
dfielding14
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
The file system on the machine I uses ceph, which by default prohibits parallel I/O (there is a way to enable this by enabling LazyIO but the admins are hesitant to do this). This means that only one core can write to a file at a time. For the normal outputs this is easily avoided by using the VTK format, but the restart files are a different story. I am running some large jobs (2048^3 +) with lots of cores (1000s) and this has become quite an issue.
Has anyone else encountered this, and come up with a solution?
One solution that comes to mind is to have each write processor write its own restart file. This would require some fairly significant changes to the way the restart files are written and read back in, so I thought I'd see if anyone has either done this or has any thoughts on this.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions