Skip to content

Thread safe Gridded Components Checklist

Amidu Oloso edited this page Oct 18, 2022 · 10 revisions

Subtle issues in the implementation of gridded components can cause them to be not thread-safe. The checklist below should be examined for each gridded component:

Implicitly shared variables

Thread-safe components must not use any shared variables with the possible exceptions being within !$omp master and !$omp single sections. And while OpenMP will generally treat local variables of procedures called from within a parallel region as thread private, it cannot do this with SAVEd Fortran variables.

Explicitly SAVEd variables

These are relatively rare in GEOS, but developers should still verify that their components do not use the SAVE attribute.

Module variables

Module variables, i.e., variables declared in the specification section of a module, implicitly have the SAVE attribute. These include module variables declared with allocatable. Declaring derived types in the specification is perfectly fine and appropriate.

Variables with default initialization

Fortran variables that have a default initialization, e.g.,

   integer ::  i = 0
   logical :: init = .false.
   real, pointer :: q(:,:,:) => null()

also implicitly have the SAVE attribute. The use of null() to initialize pointers is particularly surprising to some developers, and rather annoying as otherwise pointers start out in an undefined status.

The solution is generally to delete the initialization portion of the variable declaration and instead initialize the variable at the beginning of the executable section of the procedure. For example:

   integer ::  i
   logical :: init
   real, pointer :: q(:,:,:)

   i = 0
   init = .false.
   q => null()

Note that this fix may not be equivalent to the original implementation. Consider the case above where init was being used to ensure that some expensive operation is only performed during the first call to a procedure. Moving that initialization to the beginning of the execution section will result in such expensive operations being performed on every invocation of that procedure. (So what is the solution?)

Variables of derived type that has components with default initialization

Consider the following snippet:

   type  MyType
       real, pointer :: q(:,:,:) => null()
   end type MyType
   ...
   type(MyType) :: var1
   type(MyType), pointer p_var2
   type(MyType), allocatable :: var3

Because the derived type MyType has a default initialized component (in this case a pointer q default initialized to null()), variables of that type also have default initialization and implicitly have the SAVE attribute. In particular var1 above is not thread safe. This can be very difficult to diagnose as the type definition may be in a different file or even in an external library.

Note that p_var2 and var3 in the above snippet do not have the SAVE attribute. So using dynamic allocation provides a potential workaround for cases where a type has a default-initialized component and/or this is unknown.

External dependencies

Another way in which a gridded component can run afoul of thread safety is to make calls into other layers which are not themselves threadsafe.

I/O

The largest concern here is I/O which can call system-level libraries which are not thread safe. Usually the solution for local I/O is to surround the print/write statements with OpenMP guards such as:

!$omp critical (my_print)
   print*,...
!$omp end critical (my_print)

Note that it is very important to name critical sections in GEOS to avoid accidental deadlock when critical sections are nested.

Of course if you only want one thread to do the I/O, you should instead use either the single or master directives instead. (And these do not have names.)

Other procedures

Unless you are certain that a given layer is threadsafe itself, you should surround calls with OpenMP critical sections as described above for I/O. Ideally someone should fix said layer and then remove the critical section. Best practice is to comment the reason for the critical section so later developers can check to see if it is no longer necessary.

Notable layers which are currently not thread safe:

  1. pFlogger (output on shared files/units)
  2. Timers (use global counters sometimes)
  3. MAPL and ESMF GetResource on the universal (shared) config.

Future improvements are expected to extend pFlogger and MAPL_Profiler to be thread safe and hopefully even thread-aware. However, the universal config will likely remain a shared resource. Best practice is to limit access to the initialize phase and surround in a critical section. Setting values into a config may require an openmp barrier depending on subsequent use.

Fortran features that cannot be used within a parallel region

OpenMP and Fortran evolve separately as standards. A consequence of this is that OpenMP is generally somewhat behind the Fortran standard in terms of specifying what language features can be used within parallel regions.

BLOCK

BLOCK was introduced in F2008 and is not allowed (as of 2021-08-13) within a parallel region. This feature is not frequently used within GEOS, and any such can readily be replaced with either inline code or changing into a module procedure or an inner procedure.

gfortran and RETURN statement from an OpenMP block

The GNU Fortran compiler (gfortran) does not allow a RETURN statement from within an OpenMP block. For example, the compiler complains about the implicit return statement in VERIFY_(STATUS)

!$omp critical
   call ESMF_ConfigGetAttribute(CF, DT, Label="RUN_DT:" , RC=STATUS)
   VERIFY_(STATUS)
!$omp end critical

The work around is to move VERIFY_(STATUS) to after !$omp end critical.

If the block is longer with multiple VERIFY/RC, then do

call first(..., rc=status)
if (status == _SUCCESS) call second(..., rc=status)
if (status == _SUCCESS) call third(..., rc=status)
!$omp end critical
_VERIFY(status)
Clone this wiki locally