forked from WeiqunZhang/miniSMC
-
Notifications
You must be signed in to change notification settings - Fork 0
WenjunDong/miniSMC
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a minimalist version of SMC, a combustion code that solves compressible Navier-Stokes equations for viscous multi-component reacting flows. SMC is developed at the Center for Computational Sciences and Engineering (CCSE) at Lawrence Berkeley National Laboratory. It uses the BoxLib library (https://ccse.lbl.gov/BoxLib/), also developed at CCSE. This minimalist version of SMC includes a stripped down version of BoxLib. Therefore, the full version of BoxLib is not needed. The code is mainly written in Fortran 90, with some parts in C. Parallelization is implemented with hybrid MPI/OpenMP. However, SMC can also be built with pure MPI, or pure OpenMP, or neither. GNU make is used to build the code. The following options can be specified in the GNUmakefile: * MPI := This determines whether we are using the Message Passing Interface (MPI) library. Leaving this option empty will disable MPI. If MPI is enabled, you need to specify how to link to the MPI library. You can set USE_MPI_WRAPPERS := t, then mpif90 or mpiifort will be used. Or you can set "mpi_include_dir", "mpi_lib_dir", and "mpi_libraries" to appropriate values in GNUmakefile. * OMP := t This determines whether we are using OpenMP. Leaving this option empty disables OpenMP. * NDEBUG := t This option determines whether we compile with support for debugging (usually also enabling some runtime checks). Setting NDEBUG := t turns off debugging. * COMP := Intel Set this to your compiler of choice (e.g., GNU). Specific options for a certain compiler are stored in the comps/$(COMP).mak file. * MIC := Set this if compiling for Intel Xeon Phi. * K_USE_AUTOMATIC := t This determines whether some arrays in kernels.F90 will be automatic or allocatable. Automatic arrays are usually on the stack. When OpenMP is used, allocating memory on the stack is usually faster than on the heap. However, one must make sure there is adequate space on the stack; otherwise a segmentation fault might occur. Note that the size of the stack for threads can be adjusted by the OMP_STACKSIZE environment variable and the shell might also impose a limit on stack size. * MKVERBOSE := t This determines the verbosity of the building process. To build an executable, type "make" in the SMC directory. Note that the above options can also be specified in a command line such as "make MPI=t MIC=t NDEBUG= ". The executable will have a name like main.GNU.debug.mpi.exe depending on the compiler and some other options specified in the GNUmakefile or command line. There are some other options provided by the GNUmakefile. You can type "make TAGS" or "make tags" to generate tag file for Emacs or vi. You can type "make clean" or "make realclean" to remove files generated during the make process. Runtime parameters can be specified in the inputs_SMC file, which must be placed in the directory where the executable is run. Below are a list of selected runtime parameters and default values. * n_cellx = -1 * n_celly = -1 * n_cellz = -1 Number of cells in the x, y, and z-directions. They must be greater than 4. * max_grid_size = 64 This determines the largest grid size of a box. If max_grid_size is too big, there might be fewer boxes than MPI tasks, and then some processors will be idle. However, if it is too small, each MPI task may have too many small boxes, and the performance will be affected. Thus, max_grid_size must be chosen according to the number of cells and the number of MPI tasks. Ideally, you would like to have one box for each MPI task to minimize the communication cost. * tb_split_dim = 2 This determines how domain decomposition in some subroutines is done for threads. When OpenMP is used, each box is "virtually" divided into "nthreads" boxes and each OpenMP thread works on one thread-box. This parameter can be either 2 or 3. When it is 2, the domain decomposition is in y and z-directions; when it is 3, the domain decomposition is in 3D. * tb_blocksize_x = -1 * tb_blocksize_y = 16 * tb_blocksize_z = 16 SMC uses a blocking strategy. These parameters determines blocking size in x, y and z-directions. The value should be either greater than or equal to 4, or less than 0. If it is less than 0, no blocking is used in that direction. * verbose = 0 This determines verbosity. * stop_time = -1.d0 Simulation stop time. * max_step = 1 Maximum number of timesteps in the simulation. Note that there are a number of OMP COLLAPSE clauses in the code. These tend to trigger compiler bugs. If the code crashes, try to run it without "COLLAPSE". More information about SMC can be found in the paper "High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry" by M. Emmett, W. Zhang, and J.B. Bell (http://arxiv.org/abs/1309.7327), and in "Optimizing for Navier-Stokes Equations" by Antonio Valles and Weiqun Zhang (Chapter 4 of "High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches", James Reinders and Jim Jeffers, published by Morgan Kaufmann, 2014). If you have any questions, please contact Weiqun Zhang (WeiqunZhang@lbl.gov). ********************************************************************** As noted in the book, the best performance shown on Figure 4.19 for the Intel Xeon Phi coprocessor is 12.99 seconds, and we found close to publication time that a few additional Fortran compiler options allow us to do better and achieve 9.11 seconds for the coprocessor (1.43x improvement vs . Figure 4.19) while keeping the processor performance the same. The compiler options for the improved performance are: -fno-alias -noprec-div -no-prec-sqrt -fimf-precision=low -fimf-domain-exclusion=15 -align array64byte -opt-assumesafe-padding -opt-streaming-stores always -opt-streaming-cache-evict=0. These new best results on coprocessor use -1, 16, 16 for the thread blocking specified in inputs_SMC file and SIMD directives are no longer needed in kernels.F90 file when using -fno-alias compiler option. The chapter discusses Figure 4.19 as best results for coprocessor. The files here reflect all these changes for higher performance.
About
DNS code solving compressible Navier-Stokes equations for viscous multi-component reacting flows
Resources
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Fortran 80.1%
- C 15.0%
- Perl 2.6%
- Makefile 2.3%