Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gfortran compilation and some warnings #410

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

micaeljtoliveira
Copy link
Contributor

@micaeljtoliveira micaeljtoliveira commented Sep 27, 2024

CABLE

Thank you for submitting a pull request to the CABLE Project.

Description

This PR fixes compilation with gfortran. I also fixes several compiler warnings (mostly about unused variables) in the MPI master/driver modules.

Type of change

Please delete options that are not relevant.

  • Bug fix

Checklist

  • The new content is accessible and located in the appropriate section.
  • I have checked that links are valid and point to the intended content.
  • I have checked my code/text and corrected any misspellings

📚 Documentation preview 📚: https://cable--410.org.readthedocs.build/en/410/

@micaeljtoliveira micaeljtoliveira added bug Something isn't working code clean-up e.g., removal of dead code labels Sep 27, 2024
@micaeljtoliveira micaeljtoliveira marked this pull request as draft September 27, 2024 04:10
@SeanBryan51
Copy link
Collaborator

@micaeljtoliveira I was able to build your gcc_compilation_and_warnings branch on Gadi with

./build.bash -C gnu --mpi -DCMAKE_Fortran_FLAGS="-ffree-line-length-none"

However the compiled executable crashes with segmentation fault (for both MPI and serial executables). This is probably outside the scope of this pull request but I will document this here for now.

For the serial case I ran the example serial configuration in the CABLE repository:

cd src/offline/
../../bin/cable cable.nml

which outputs:

 THE NAME LIST IS cable.nml
 Use spatially-specific soil properties;          360         150
           0
           0
 When choosing spatially-specific soil properties,
 snow-free albedo is also overwritten by this data set.
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...
 Total number of patches (countPatch):            1
 iveg           1           2
 patchfrac           1   1.00000000
  Could neither find restart file ./
  nor ./
  Pre-loaded default initialisations are used.
  Loaded some parameters from met input file: TumbaFluxnet.1.3_met.nc the rest are default values - check log file
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fc420b985af in ???
#1  0x47e4da in ???
#2  0x40d6f8 in ???
#3  0x40321c in ???
#4  0x7fc420b847e4 in ???
#5  0x40325d in ???
#6  0xffffffffffffffff in ???
Segmentation fault

I was able to trace the segmentation fault to the acc_out_var using the DDT debugger (requires setting CABLE_GNU_Fortran_FLAGS_DEBUG to -g -O0 and building in the debug configuration).

For the MPI case I ran the crujra_accessN96_1h config which crashes similarly to the serial case.

Building your branch with the Intel compiler and running the executable seems to work fine so it looks like this is specific to the gfortran build.

Copy link
Collaborator

@SeanBryan51 SeanBryan51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug configuration for gfortran has some pretty pedantic flags enabled and currently doesn't build on Gadi - is it worth ironing these out?

@SeanBryan51
Copy link
Collaborator

@mcuntz I recall I used your compiler flags when adding the GNU compiler. The debug flags currently contain -O (equivalent to -O1) which enables some optimisations and prevents me from using the DDT debugger. Is there a reason why you use -O instead of -O0 in the debug flags for gfortran?

@mcuntz
Copy link
Contributor

mcuntz commented Sep 30, 2024

@SeanBryan51 -O is there simply for speed even in debug mode. I basically debug with print statements and do not use much the debuggers. No worry to change it.

However, I think that the compilation should work with the pretty pedantic flags. If it works with another compiler means that that compiler assumes something and it is not always what you'd assume if you had coded properly. So I would be 100% for ironing out the places where it is not compiling.

@micaeljtoliveira
Copy link
Contributor Author

@SeanBryan51 I'm able to reproduce your error. I've compiled cable with the -fbacktrace option (plus a few others), so I get a more useful error message:

At line 2908 of file /home/micael/workspaces/cable/git/src/offline/cable_input.F90
Fortran runtime warning: An array temporary was created
  Loaded some parameters from met input file: TumbaFluxnet.1.3_met.nc the rest are default values - check log file
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...
At line 2194 of file /home/micael/workspaces/cable/git/src/offline/cable_output.F90
Fortran runtime error: Array bound mismatch for dimension 2 of array 'acc_val' (6/1)

Error termination. Backtrace:
#0  0x4745bc in generate_out_write_acc_d2
        at /home/micael/workspaces/cable/git/src/offline/cable_output.F90:2194
#1  0x477303 in __cable_output_module_MOD_write_output
        at /home/micael/workspaces/cable/git/src/offline/cable_output.F90:1769
#2  0x7ab84b in cable_offline_driver
        at /home/micael/workspaces/cable/git/src/offline/cable_driver.F90:942
#3  0x40379c in main
        at /home/micael/workspaces/cable/git/src/offline/cable_driver.F90:62

@micaeljtoliveira
Copy link
Contributor Author

micaeljtoliveira commented Oct 1, 2024

Okay, I think I understand the problem, but I'm not sure how you want to fix it.

In the following call in cable_output.F90:

CALL generate_out_write_acc(output%SoilMoist, ovid%SoilMoist, 'SoilMoist', out%SoilMoist, REAL(ssnow%wb, 4), ranges%SoilMoist, patchout%SoilMoistIce, out_settings)

output%SoilMoist is set to false and the out%SoilMoist and ssnow%wb arguments are passed down to the elemental function generate_out_write_acc_d2. At that point, the compiler checks the bounds of those arguments, as it needs to be sure they are compatible, as the function needs to be called element-wise. But output%SoilMoist is actually never allocated (because output%SoilMoist is false) and the check fails.

There are basically two solutions:

  1. Do not check array bounds are run-time.
  2. Exit generate_out_write_acc when argument output_var is false before calling acc_out_var.

I would recommend against option 1., as the array bounds check is actually very useful when debugging.

@micaeljtoliveira
Copy link
Contributor Author

Actually, even with the checks disabled, it fails (as I should have realized from the segfault message obtained by @SeanBryan51).

This means one really cannot call an elemental function over an array that has not been allocated, which makes total sense. I would thus argue that in this case gfortran is actually doing the right thing and it's the Intel compiler that is misbehaving.

@micaeljtoliveira
Copy link
Contributor Author

There's a few more problematic cases in the same module were several non-allocated arrays are being accessed. For example, here

CALL generate_out_write_acc(output%casa, ovid%PlantTurnover, 'PlantTurnover', out%PlantTurnover, &
                                REAL((SUM(casaflux%Cplant_turnover, 2))/86400.0/1.201E-5, 4), ranges%NEE, patchout%PlantTurnover, out_settings)

it might happen that casaflux%Cplant_turnover is not allocated, causing a segfault.

I think the only sensible solution is to fence all calls to generate_out_write_acc with an if statement. Note that this issue was introduced in #287. @abhaasgoyal FYI.

@SeanBryan51
Copy link
Collaborator

@micaeljtoliveira thanks for looking into it.

I think the only sensible solution is to fence all calls to generate_out_write_acc with an if statement.

This seems appropriate to me. I'm happy to let you decide if you want to fix this in this pull request.

@micaeljtoliveira
Copy link
Contributor Author

This seems appropriate to me. I'm happy to let you decide if you want to fix this in this pull request.

I think I rather do it in a separate PR.

@SeanBryan51
Copy link
Collaborator

@micaeljtoliveira FYI the build-ci action has been fixed (#409) - if you rebase off main you should get a green light on the CI checks

…odule, as gfortran provides a similar function that takes no arguments.
@micaeljtoliveira micaeljtoliveira marked this pull request as ready for review October 8, 2024 04:00
@micaeljtoliveira micaeljtoliveira force-pushed the gcc_compilation_and_warnings branch from 70d6560 to d0e1995 Compare October 8, 2024 04:01
Copy link
Collaborator

@SeanBryan51 SeanBryan51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - other issues related to the gfortran build can be addressed at a later point

@micaeljtoliveira micaeljtoliveira merged commit 41af6f7 into main Oct 17, 2024
7 checks passed
@micaeljtoliveira micaeljtoliveira deleted the gcc_compilation_and_warnings branch October 17, 2024 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working code clean-up e.g., removal of dead code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants