forked from WeiqunZhang/miniSMC
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
146 lines (114 loc) · 6.08 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
This is a minimalist version of SMC, a combustion code that solves
compressible Navier-Stokes equations for viscous multi-component
reacting flows. SMC is developed at the Center for Computational
Sciences and Engineering (CCSE) at Lawrence Berkeley National
Laboratory. It uses the BoxLib library
(https://ccse.lbl.gov/BoxLib/), also developed at CCSE. This
minimalist version of SMC includes a stripped down version of BoxLib.
Therefore, the full version of BoxLib is not needed.
The code is mainly written in Fortran 90, with some parts in C.
Parallelization is implemented with hybrid MPI/OpenMP. However, SMC
can also be built with pure MPI, or pure OpenMP, or neither.
GNU make is used to build the code. The following options can be
specified in the GNUmakefile:
* MPI :=
This determines whether we are using the Message Passing
Interface (MPI) library. Leaving this option empty will disable
MPI. If MPI is enabled, you need to specify how to link to the MPI
library. You can set USE_MPI_WRAPPERS := t, then mpif90 or mpiifort
will be used. Or you can set "mpi_include_dir", "mpi_lib_dir", and
"mpi_libraries" to appropriate values in GNUmakefile.
* OMP := t
This determines whether we are using OpenMP. Leaving this option
empty disables OpenMP.
* NDEBUG := t
This option determines whether we compile with support for debugging
(usually also enabling some runtime checks). Setting NDEBUG := t
turns off debugging.
* COMP := Intel
Set this to your compiler of choice (e.g., GNU). Specific options
for a certain compiler are stored in the comps/$(COMP).mak file.
* MIC :=
Set this if compiling for Intel Xeon Phi.
* K_USE_AUTOMATIC := t
This determines whether some arrays in kernels.F90 will be automatic
or allocatable. Automatic arrays are usually on the stack.
When OpenMP is used, allocating memory on the stack is usually
faster than on the heap. However, one must make sure there is
adequate space on the stack; otherwise a segmentation fault might
occur. Note that the size of the stack for threads can be adjusted
by the OMP_STACKSIZE environment variable and the shell might also
impose a limit on stack size.
* MKVERBOSE := t
This determines the verbosity of the building process.
To build an executable, type "make" in the SMC directory. Note that
the above options can also be specified in a command line such as
"make MPI=t MIC=t NDEBUG= ". The executable will have a name like
main.GNU.debug.mpi.exe depending on the compiler and some other
options specified in the GNUmakefile or command line. There are some
other options provided by the GNUmakefile. You can type "make TAGS"
or "make tags" to generate tag file for Emacs or vi. You can type
"make clean" or "make realclean" to remove files generated during the
make process.
Runtime parameters can be specified in the inputs_SMC file, which must
be placed in the directory where the executable is run. Below are a
list of selected runtime parameters and default values.
* n_cellx = -1
* n_celly = -1
* n_cellz = -1
Number of cells in the x, y, and z-directions. They must be greater than 4.
* max_grid_size = 64
This determines the largest grid size of a box. If max_grid_size is
too big, there might be fewer boxes than MPI tasks, and then some
processors will be idle. However, if it is too small, each MPI task
may have too many small boxes, and the performance will be affected.
Thus, max_grid_size must be chosen according to the number of cells
and the number of MPI tasks. Ideally, you would like to have one
box for each MPI task to minimize the communication cost.
* tb_split_dim = 2
This determines how domain decomposition in some subroutines is done
for threads. When OpenMP is used, each box is "virtually" divided
into "nthreads" boxes and each OpenMP thread works on one
thread-box. This parameter can be either 2 or 3. When it is 2, the
domain decomposition is in y and z-directions; when it is 3, the
domain decomposition is in 3D.
* tb_blocksize_x = -1
* tb_blocksize_y = 16
* tb_blocksize_z = 16
SMC uses a blocking strategy. These parameters determines blocking
size in x, y and z-directions. The value should be either greater
than or equal to 4, or less than 0. If it is less than 0, no
blocking is used in that direction.
* verbose = 0
This determines verbosity.
* stop_time = -1.d0
Simulation stop time.
* max_step = 1
Maximum number of timesteps in the simulation.
Note that there are a number of OMP COLLAPSE clauses in the code.
These tend to trigger compiler bugs. If the code crashes, try to run
it without "COLLAPSE".
More information about SMC can be found in the paper "High-Order
Algorithms for Compressible Reacting Flow with Complex Chemistry" by
M. Emmett, W. Zhang, and J.B. Bell (http://arxiv.org/abs/1309.7327),
and in "Optimizing for Navier-Stokes Equations" by Antonio Valles and
Weiqun Zhang (Chapter 4 of "High Performance Parallelism Pearls:
Multicore and Many-core Programming Approaches", James Reinders and
Jim Jeffers, published by Morgan Kaufmann, 2014). If you have any
questions, please contact Weiqun Zhang (WeiqunZhang@lbl.gov).
**********************************************************************
As noted in the book, the best performance shown on Figure 4.19 for
the Intel Xeon Phi coprocessor is 12.99 seconds, and we found close to
publication time that a few additional Fortran compiler options allow
us to do better and achieve 9.11 seconds for the coprocessor (1.43x
improvement vs . Figure 4.19) while keeping the processor performance
the same. The compiler options for the improved performance are:
-fno-alias -noprec-div -no-prec-sqrt -fimf-precision=low
-fimf-domain-exclusion=15 -align array64byte -opt-assumesafe-padding
-opt-streaming-stores always -opt-streaming-cache-evict=0.
These new best results on coprocessor use -1, 16, 16 for the thread
blocking specified in inputs_SMC file and SIMD directives are no
longer needed in kernels.F90 file when using -fno-alias compiler
option. The chapter discusses Figure 4.19 as best results for
coprocessor.
The files here reflect all these changes for higher performance.