-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison reproducibility #2
Comments
It is faster for the gfortran v13! I did that comparison a few months back with version 13 too and found a similar conclusion as yours. My assumption is due to do concurrent there is a significant effort to improve performance in the do loop. Here is my output for the gfortran 12, windows OS. I did it just now. |
The indexed version contains some errors. The following two lines, ! adjust concentration in range
if ( con(i,j) >= 0.99999 ) con(i,j) = 0.99999
if ( con(i,j) < 0.00001 ) con(i,j) = 0.00001 should be inside of the loop. If I'm not mistaken you also need to calculate the entire laplacian of concentration before using it to update the phi values. This is for the indexed version. |
Here is a slightly less legible version, but it performs roughly 4x time faster for me. I'm not familiar with phase-field simulations, but the output looks like what is expected. $ make FFLAGS="-Wall -O2 -march=native" index_array
gfortran-13 -o index_array -Wall -O2 -march=native index_array.f90
$ time ./index_array
------------------------------------------------------
Indexed array compute time = 1.908 seconds.
real 0m2.062s
user 0m1.965s
sys 0m0.009s program fd_ch_test
implicit none
! ===========================================================================
! parameters
! ===========================================================================
integer, parameter :: ik = kind(0)
integer, parameter :: wp = kind(1.0d0)
integer(kind=ik), parameter :: Nx = 260 , Ny = 260, dx = 1, dy = 1
real(kind=wp), dimension (0:Nx+1, 0:Ny+1, 2) :: con
real(kind=wp), dimension (0:Nx+1, 0:Ny+1) :: r, lap_con
integer(kind=ik) :: nsteps, istep, flip
real(kind=wp) :: dt, c0, mobility, grad_coef, noise, A
real(kind=wp) :: start , finish
flip = 1
nsteps = 30000
dt = 0.01_wp
c0 = 0.4_wp
mobility = 1.0_wp
grad_coef = 0.5_wp
noise = 0.02_wp
A = 1.0_wp
! ===========================================================================
! initial microstucture
! ===========================================================================
call random_number( r )
con(:,:,flip) = c0 + noise * (0.5_wp - r)
call periodic_halo(nx,ny,con(:,:,flip))
! ===========================================================================
! evolution of microstructure
! ===========================================================================
call cpu_time ( start )
time_loop: do istep = 1, nsteps
select case(flip)
case(1)
call update(nx,ny,cold=con(:,:,1),cnew=con(:,:,2),wrk=lap_con, &
dt=dt,mobility=mobility,A=A,grad_coef=grad_coef)
case(2)
call update(nx,ny,cold=con(:,:,2),cnew=con(:,:,1),wrk=lap_con, &
dt=dt,mobility=mobility,A=A,grad_coef=grad_coef)
end select
flip = 3 - flip
end do time_loop
call cpu_time ( finish )
! ===========================================================================
! print computed time on the screen
! ===========================================================================
print *,'------------------------------------------------------'
print '(" Indexed array compute time = ", f10.3," seconds." )', finish - start
block
integer :: unit, i, j
open(newunit=unit,file="results.out",action="write")
do i = 1, nx
do j = 1, ny
write(unit, *) (i-1), (j-1), con(i,j,flip)
end do
write(unit,*) ""
end do
close(unit)
end block
contains
subroutine periodic_halo(nx,ny,con)
integer, intent(in) :: nx, ny
real(wp), intent(inout) :: con(0:nx+1,0:ny+1)
! Fix corners
con(0,0) = con(nx,ny)
con(Nx+1,0) = con(1,ny)
con(0,Ny+1) = con(nx,1)
con(Nx+1,Ny+1) = con(1,1)
! Fix east and west sides
con(0,1:ny) = con(nx,1:ny)
con(Nx+1,1:ny) = con(1,1:ny)
! Fix north and south sides
con(1:nx,0) = con(1:nx,Ny)
con(1:nx,ny+1) = con(1:nx,1)
end subroutine
subroutine update(nx,ny,cold,cnew,wrk,dt,A,mobility,grad_coef)
implicit none
integer, intent(in) :: nx, ny
real(wp), intent(in) :: cold(0:nx+1,0:ny+1)
real(wp), intent(out) :: cnew(0:nx+1,0:ny+1)
real(wp), intent(inout) :: wrk(0:nx+1,0:ny+1)
real(wp), intent(in) :: dt, A, mobility, grad_coef
real(wp) :: lap, dfd
integer :: i, j
real(wp) :: alpha
alpha = 1.0_wp / (real(dx,wp) * real(dy,wp))
do j = 1, Ny
do i = 1, Nx
dfd = A * ( 2.0_wp*cold(i,j)*( 1.0_wp - cold(i,j) )**2 &
- 2.0_wp*cold(i,j)**2*( 1.0_wp - cold(i,j) ) )
lap = alpha * ( cold(i+1,j) + cold(i-1,j) + cold(i,j-1) + cold(i,j+1) - &
4.0_wp*cold(i,j) )
wrk(i,j) = dfd - grad_coef*lap
end do
end do
call periodic_halo(nx,ny,wrk)
do j = 1, Ny
do i = 1, Nx
lap = alpha * ( wrk(i+1,j) + wrk(i-1,j) + wrk(i,j-1) &
+ wrk(i,j+1) - 4.0_wp*wrk(i,j) )
cnew(i,j) = cold(i,j) + dt*mobility*lap
!cnew(i,j) = max(0.00001_wp,min(0.99999_wp, cnew(i,j)))
end do
end do
call periodic_halo( nx, ny, cnew)
end subroutine
end program fd_ch_test |
Yeah. You caught it right! I corrected the code now. Your new code is very performance-oriented. I am not familiar with that coding style and how it is used. I will go through the details and will learn from your code. Thanks for it. |
The changes are essentially:
|
How does this flip-flop technique work if we use do concurrent with OpenMP compiler flag? Or it is just for the usual do loop? |
You can use whatever you want inside of the update subroutine. Using allocatable arrays and swapping them with The time-stepping loop would look like this: time_loop: do istep = 1, nsteps
call update(nx,ny,cold=cold,cnew=cnew,wrk=lap_con, &
dt=dt,mobility=mobility,A=A,grad_coef=grad_coef)
call swap(cold,cnew)
t = t + dt
end do time_loop
contains
subroutine swap(cold,cnew)
real(wp), intent(inout), allocatable :: cold(:,:), cnew(:,:)
real(wp), allocatable :: ctmp(:,:)
call move_alloc(from=cold,to=ctmp)
call move_alloc(from=cnew,to=cold)
call move_alloc(from=ctmp,to=cnew)
end subroutine
|
The purpose here in this repository is to show the difference between the whole array and the loop. The use of parallel programming or OpenMP was not a part of this work here. I wanna do it later stage. See in the Future work. But your style of coding has shown lots of ways to do it. It is surely worth trying. I will try it and will be in touch with you. Thanks for sharing your code here. |
I run your comparison on my Mac, using gfortran v13:
The indexed version looks faster.
The text was updated successfully, but these errors were encountered: