-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
403159b
commit c8dd469
Showing
5 changed files
with
33 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,34 @@ | ||
## nf_retry | ||
|
||
## Where it goes wrong | ||
|
||
[Here](https://github.com/GEOS-ESM/MAPL/blob/d5009302d5ebac669cc7ff93db5134154a2c7a88/pfio/NetCDF4_FileFormatter.F90#L256)L | ||
|
||
```fortran | ||
if (this%parallel) then | ||
!$omp critical | ||
status = nf90_open(file, IOR(omode, NF90_MPIIO), comm=this%comm, info=this%info, ncid=this%ncid) | ||
!$omp end critical | ||
_VERIFY(status) | ||
else | ||
!$omp critical | ||
status = nf90_open(file, IOR(omode, NF90_SHARE), this%ncid) ! this line can fail | ||
!$omp end critical | ||
_VERIFY(status) | ||
``` | ||
|
||
## NetCDF-Fortran's Documentation of Error Handling | ||
This is a module that provides `nf90_open_with_retries`. This function wraps `nf90_open` | ||
with the following functionality | ||
|
||
### 1.6 Error Handling | ||
1. NetCDF errors are printed to the console | ||
2. Automatically retry opening the file | ||
|
||
The netCDF library provides the facilities needed to handle errors in a flexible way. Each netCDF function returns an integer status value. If the returned status value indicates an error, you may handle it in any way desired, from printing an associated error message and exiting to ignoring the error indication and proceeding (not recommended!). For simplicity, the examples in this guide check the error status and call a separate function to handle any errors. | ||
This is useful for working around intermittent and erroneous i/o errors in HPC cluster filesystems like GPFS. | ||
|
||
The NF90_STRERROR function is available to convert a returned integer error status into an error message string. | ||
`nf_retry` is disabled by default and `nf90_open_with_retries` behaves exactly like `nf90_open` (except error | ||
are automatically printed). `nf_retry` can be enabled and configured by creating `nf_retry.nml`: | ||
|
||
Occasionally, low-level I/O errors may occur in a layer below the netCDF library. For example, if a write operation causes you to exceed disk quotas or to attempt to write to a device that is no longer available, you may get an error from a layer below the netCDF library, but the resulting write error will still be reflected in the returned status value. | ||
``` | ||
&nf_retry | ||
nf_retry_wait=1, ! Seconds to wait before retry | ||
nf_retry_max_tries=3, ! Max number of retries before failure | ||
nf_retry_catch=0,1,3,2, ! NetCDF error codes to catch | ||
nf_retry_catch_all=F, ! Catch all NetCDF errors? | ||
/ | ||
``` | ||
|
||
### Summary | ||
### Implementing nf_retry | ||
|
||
Do | ||
Use the `nf_retry` module and replace `nf90_open` calls with `nf90_open_with_retries`. | ||
|
||
```fortran | ||
status = nf90_open(...) | ||
if(status /= nf90_noerr) then | ||
print *, trim(nf90_strerror(status)) | ||
endif | ||
NF90_STRERROR | ||
```diff | ||
+ use nf_retry | ||
... | ||
- status = nf90_open(path, mode, ncid) | ||
+ status = nf90_open_with_retries(path, mode, ncid) | ||
``` | ||
|
||
## References | ||
1. https://www.unidata.ucar.edu/software/netcdf/docs-fortran/nc_f77_interface_guide.html#f77_NF_OPEN_ | ||
2. https://www.unidata.ucar.edu/software/netcdf/docs-fortran/f90_datasets.html#f90-nf90_strerror | ||
Make sure to add `nf_retry.f90` to you build. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
&nf_retry | ||
nf_retry_wait=1, ! Seconds to wait before retry | ||
nf_retry_max_tries=3, ! Max number of retries before failure | ||
nf_retry_catch=0,1,3,2, ! NetCDF error codes to catch | ||
nf_retry_catch=0,1,2,3, ! NetCDF error codes to catch | ||
nf_retry_catch_all=F, ! Catch all NetCDF errors? | ||
/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters