Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SYCL/Data Parallel C++ (DPCPP) #183

Open
wants to merge 48 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e1effc6
Initial dpcpp commit at a544015a5
emascarenhas Feb 28, 2022
f1a7fac
Avoiding conflict with definitions for VERSION and G when using the D…
emascarenhas Feb 28, 2022
29f80ea
After merge conflict removal and compiling, some tests are passing.
emascarenhas Mar 2, 2022
92a11ed
Allow input files like 7cpa without receptor names in the protein.map…
emascarenhas Mar 2, 2022
0ad002a
Minor cleanup
emascarenhas Mar 2, 2022
9841c1b
Update for DPCPP and XeGPU support.
emascarenhas Mar 3, 2022
d03a6ae
Change in Makefile.dpcpp for AOT compile and .gitignore.
emascarenhas Apr 7, 2022
4e889b1
Fix include of unnecessary sycl.hpp and dpct include files that break…
emascarenhas May 4, 2022
62d742c
Use G_AD instead of G consistently.
emascarenhas May 6, 2022
6e2683f
Use float version of pow function instead of double for performance/s…
emascarenhas Jun 15, 2022
dfe6a70
Use reduce_over_group instead on oneapi extension
emascarenhas Jul 7, 2022
820d4c8
Use XeDeviceSynchronize in DOCK_DEBUG ifdef code
emascarenhas Jul 7, 2022
e26a528
Add support for additional SYCL work group sizes and support to build…
emascarenhas Jul 7, 2022
0df9039
Explicitly use local memory for barriers.
emascarenhas Aug 12, 2022
9bdd335
Fix compile error for CPU.
emascarenhas Sep 16, 2022
06039ce
Adding alternative device code supporting sycl::native math functions.
L30nardoSV Sep 27, 2022
0686611
More improvements on alternative device code based on native math.
Sep 27, 2022
0fc2e68
Disabling native math.
Sep 27, 2022
0b048a0
Controlling switching between native and non-native math from a uniqu…
Sep 28, 2022
7269a08
Defining macros (e.g., SYCL_SQRT, SYCL_SIN, etc) to use same label fo…
Sep 28, 2022
c8b1bca
Merge pull request #1 from LeoCollab/add-native-math
emascarenhas Sep 29, 2022
952f524
Using explicitly local memory barriers in <dpcpp/calcMergeEneGra.dp.c…
Sep 29, 2022
35db225
Controlling switching between local and global memory space for barri…
L30nardoSV Sep 29, 2022
a4fdae1
Adding missing native math switches.
L30nardoSV Sep 29, 2022
16de6fe
Adding missing native math switches in ADAM kernel.
L30nardoSV Sep 29, 2022
4de77a4
Merge pull request #2 from LeoCollab/switch-barriers
emascarenhas Sep 30, 2022
922c188
Add Docking time to OpenCL
emascarenhas Oct 2, 2022
7579e37
use wg mem scope for atomics; clang Makefile upd
emascarenhas Oct 5, 2022
0442ddb
make atomics memory_scope a macro
emascarenhas Oct 5, 2022
6996999
Merging fixes from local for AOT compile, etc.
emascarenhas Nov 23, 2022
b70da18
Add support for H100
emascarenhas Dec 1, 2022
1689692
Add LDEBUG_VTUNE
emascarenhas Dec 2, 2022
72e4309
Add experiment for relaxed memory order for Atomics
emascarenhas Dec 2, 2022
cb6a186
Avoid making PVC default platform.
emascarenhas Dec 9, 2022
0089ddf
Add support for ICX sycl compile AOT
emascarenhas Dec 8, 2022
f82074c
Adding optimization flags.
L30nardoSV Dec 30, 2022
50fabd2
Replacing sycl::accessor<...target:local> with sycl::local_accessor<.…
L30nardoSV Jan 5, 2023
86b9a7d
Removing commented-out code.
L30nardoSV Jan 5, 2023
ea3e2b8
Merge pull request #1 from LeoCollab/remove-warns
L30nardoSV Jan 5, 2023
a00cd5a
Merge pull request #5 from LeoCollab/develop
emascarenhas Jan 5, 2023
9aeabe1
Forcing maximum number of registers per thread with compile option: -…
L30nardoSV Jan 6, 2023
9750c01
Merge pull request #6 from LeoCollab/fix-numreg-per-thread
emascarenhas Jan 18, 2023
fc8f57c
Print best energy even if autostop is 0.
emascarenhas Jan 17, 2023
bf07b90
Print best energy when autostop is false for cuda and opencl.
emascarenhas Jan 23, 2023
64b52ea
Replacing pow() with native counterpart.
L30nardoSV Jan 25, 2023
56f66a9
Merge pull request #7 from LeoCollab/develop
emascarenhas Jan 25, 2023
66d75e1
Remove hard set of sub_group size for reduction kernel.
emascarenhas Feb 22, 2023
480b4de
Updates to README for SYCL
emascarenhas Apr 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ host/inc/stringify.h
final_population_run*
device/stringify_tmp
KernelProgramBuildInfo.txt
run.sh
run_batch.sh
gdb.txt
performdocking.h
performdocking.cpp

# ===================
# C gitignore
Expand Down
19 changes: 18 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# AutoDock-GPU Makefile
# Copyright (C) 2022 Intel Corporation

# ------------------------------------------------------
# Note that environment variables must be defined
Expand All @@ -20,9 +21,11 @@
# in any other case, OpenCL will be used
# OpenCL GPU path can be explicitly used with
# DEVICE=OCLGPU
# Choose Xe/DPC++ Device
# DEVICE=XeGPU
# ------------------------------------------------------
# Choose OpenCL device
# Valid values: CPU, GPU, CUDA, OCLGPU
# Valid values: CPU, GPU, CUDA, OCLGPU, XeGPU

ifeq ($(DEVICE), $(filter $(DEVICE),GPU CUDA))
TEST_CUDA := $(shell ./test_cuda.sh nvcc "$(GPU_INCLUDE_PATH)" "$(GPU_LIBRARY_PATH)")
Expand All @@ -37,9 +40,23 @@ override DEVICE:=GPU
export
include Makefile.Cuda
else
ifeq ($(DEVICE),XeGPU)
override DEVICE:=GPU
include Makefile.dpcpp
export
else
# run SYCL version on NVidia GPU
ifeq ($(DEVICE),NvGPU)
override DEVICE:=GPU
PLATFORM=NvGPU
include Makefile.dpcpp
export
else
ifeq ($(DEVICE),OCLGPU)
override DEVICE:=GPU
export
endif
include Makefile.OpenCL
endif
endif
endif
6 changes: 3 additions & 3 deletions Makefile.Cuda
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ NVCC = nvcc
CPP = g++
UNAME := $(shell uname)

TARGETS = 52 60 61 70
TARGETS = 52 60 61 70 80 90
CUDA_TARGETS=$(foreach target,$(TARGETS),-gencode arch=compute_$(target),code=sm_$(target))

ifeq ($(DEVICE), CPU)
Expand Down Expand Up @@ -171,8 +171,8 @@ check-env-all: check-env-dev check-env-cpu check-env-gpu

GIT_VERSION := $(shell ./version_string.sh)

CFLAGS+=-DVERSION=\"$(GIT_VERSION)\"
TOOL_CFLAGS+=-DVERSION=\"$(GIT_VERSION)\"
CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"
TOOL_CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"

# ------------------------------------------------------

Expand Down
5 changes: 3 additions & 2 deletions Makefile.OpenCL
Original file line number Diff line number Diff line change
Expand Up @@ -235,8 +235,9 @@ check-env-all: check-env-dev check-env-cpu check-env-gpu

GIT_VERSION := $(shell ./version_string.sh)

CFLAGS+=-DVERSION=\"$(GIT_VERSION)\"
TOOL_CFLAGS+=-DVERSION=\"$(GIT_VERSION)\"
CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"
TOOL_CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"


# ------------------------------------------------------

Expand Down
286 changes: 286 additions & 0 deletions Makefile.dpcpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
## AutoDock-GPU DPCPP Makefile
# Copyright (C) 2022 Intel Corporation

# ------------------------------------------------------
# Note that environment variables must be defined
# before compiling
# DEVICE?
# if DEVICE=CPU: CPU_INCLUDE_PATH?, CPU_LIBRARY_PATH?
# if DEVICE=GPU: GPU_INCLUDE_PATH?, GPU_LIBRARY_PATH?

# Choose Xe/DPC++ Device
# DEVICE=XeGPU
# ------------------------------------------------------
# Choose OpenCL device
# Valid values: CPU, GPU

DPCPP = icpx -fsycl
CPP = icpx -fsycl
UNAME := $(shell uname)

# Assume $ONEAPIPATH/setvars.sh has been run


ifeq ($(DEVICE), CPU)
DEV =-DCPU_DEVICE
else ifeq ($(DEVICE), GPU)
DEV =-DGPU_DEVICE
ifeq ($(PLATFORM), NvGPU)
DPCPP=clang++
CPP=clang++
IFLAGS=-fsycl -fsycl-targets=nvptx64-nvidia-cuda -fgpu-inline-threshold=100000 -Xsycl-target-backend --cuda-gpu-arch=sm_80 \
-Xcuda-ptxas --verbose -Xcuda-ptxas --maxrregcount=64 \
--cuda-path=${CUDA_TOOLKIT_ROOT_DIR}
endif
endif

# ------------------------------------------------------
# Project directories
# opencl_lvs: wrapper for OpenCL APIs
COMMON_DIR=./common
HOST_INC_DIR=./host/inc
HOST_SRC_DIR=./host/src
KRNL_DIR=./dpcpp
KCMN_DIR=$(COMMON_DIR)
BIN_DIR=./bin
LIB_XeGPU = kernels.dp.o

TARGET := autodock
TOOL_TARGET := adgpu_analysis

IFLAGS+=-cl-fast-relaxed-math -cl-single-precision-constant
IFLAGS+=-I$(COMMON_DIR) -I$(HOST_INC_DIR) -I$(KRNL_DIR)
# set DPCT_INCUDE_PATH in the environment if DPCT helper functions are used for SYCL on non Intel platforms
ifdef DPCT_INCLUDE_PATH
IFLAGS+=-I$(DPCT_INCLUDE_PATH)
endif
LFLAGS=-Wl,-rpath=$(GPU_LIBRARY_PATH):$(CPU_LIBRARY_PATH)
CFLAGS= $(IFLAGS) $(LFLAGS)
TOOL_CFLAGS=-std=c++17 -I$(COMMON_DIR) -I$(HOST_INC_DIR)

ifeq ($(DEVICE), CPU)
TARGET:=$(TARGET)_cpu
else ifeq ($(DEVICE), GPU)
NWI=-DN64WI
TARGET:=$(TARGET)_xegpu
endif

ifeq ($(OVERLAP), ON)
PIPELINE=-DUSE_PIPELINE -fopenmp
endif


BIN := $(wildcard $(TARGET)*)

# ------------------------------------------------------
# Number of work-items (wi)
# Valid values: 32, 64, 128, 256
NUMWI=

ifeq ($(NUMWI), 8)
NWI=-DN8WI
TARGET:=$(TARGET)_8wi
else ifeq ($(NUMWI), 16)
NWI=-DN16WI
TARGET:=$(TARGET)_16wi
else ifeq ($(NUMWI), 32)
NWI=-DN32WI
TARGET:=$(TARGET)_32wi
else ifeq ($(NUMWI), 64)
NWI=-DN64WI
TARGET:=$(TARGET)_64wi
else ifeq ($(NUMWI), 128)
NWI=-DN128WI
TARGET:=$(TARGET)_128wi
else ifeq ($(NUMWI), 256)
NWI=-DN256WI
TARGET:=$(TARGET)_256wi
else ifeq ($(NUMWI), 512)
NWI=-DN512WI
TARGET:=$(TARGET)_512wi
else ifeq ($(NUMWI), 1024)
NWI=-DN1024WI
TARGET:=$(TARGET)_1024wi
else
ifeq ($(DEVICE), CPU)
NWI=-DN16WI
TARGET:=$(TARGET)_16wi
else ifeq ($(DEVICE), GPU)
NWI=-DN64WI
TARGET:=$(TARGET)_64wi
endif
endif

# ------------------------------------------------------
# Configuration
# FDEBUG (full) : enables debugging on both host + device
# LDEBUG (light): enables debugging on host
# RELEASE
CONFIG=RELEASE
#CONFIG=FDEBUG

ifeq ($(CONFIG),FDEBUG)
OPT =-g -Wall -DDOCK_DEBUG
ifeq ($(DEVICE), GPU)
# for AOT compile and debug
# OPT+=-fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs "-device ats -internal_options -cl-kernel-debug-enable -options -cl-opt-disable"
# for AOT compile
# OPT+=-fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs "-device ats"
endif
else ifeq ($(CONFIG),FDEBUG_VTUNE)
OPT =-g -Wall -DDOCK_DEBUG -gline-tables-only -fdebug-info-for-profiling
else ifeq ($(CONFIG),LDEBUG_VTUNE)
OPT =-g -Wall -gline-tables-only -fdebug-info-for-profiling
else ifeq ($(CONFIG),LDEBUG)
OPT =-g -Wall
else ifeq ($(CONFIG),RELEASE)
OPT =-O3
ifeq ($(PLATFORM),PVC)
OPT+=-fsycl-targets=spir64_gen -Xs "-device 0x0BD6 -revision_id 7"
else ifeq ($(PLATFORM),ICX)
OPT+=-fsycl-targets=spir64_gen -Xs "-device icllp"
endif
else
OPT =
endif

# ------------------------------------------------------
# Reproduce results (remove randomness)
REPRO=NO

ifeq ($(REPRO),YES)
REP =-DREPRO
else
REP =
endif
# ------------------------------------------------------

#all: otool odock
all: odock

check-env-dev:
@if test -z "$$DEVICE"; then \
echo "Please set DEVICE to either CPU, GPU, CUDA, XeGPU, or OCLGPU to build docking software."; \
exit 1; \
else \
if [ "$$DEVICE" = "CPU" ]; then \
echo "DEVICE is set to $$DEVICE"; \
else \
if [ "$$DEVICE" = "GPU" ]; then \
echo "DEVICE is set to $$DEVICE"; \
else \
echo "DEVICE value is invalid. Please set DEVICE to either CPU, GPU, XeGPU or OCLGPU"; \
exit 1; \
fi; \
fi; \
fi; \
echo " "

check-env-cpu:
@if test -z "$$CPU_INCLUDE_PATH"; then \
echo "CPU_INCLUDE_PATH is undefined"; \
else \
echo "CPU_INCLUDE_PATH is set to $$CPU_INCLUDE_PATH"; \
fi; \
if test -z "$$CPU_LIBRARY_PATH"; then \
echo "CPU_LIBRARY_PATH is undefined"; \
else \
echo "CPU_LIBRARY_PATH is set to $$CPU_LIBRARY_PATH"; \
fi; \
echo " "

check-env-gpu:
@if test -z "$$GPU_INCLUDE_PATH"; then \
echo "GPU_INCLUDE_PATH is undefined"; \
else \
echo "GPU_INCLUDE_PATH is set to $$GPU_INCLUDE_PATH"; \
fi; \
if test -z "$$GPU_LIBRARY_PATH"; then \
echo "GPU_LIBRARY_PATH is undefined"; \
else \
echo "GPU_LIBRARY_PATH is set to $$GPU_LIBRARY_PATH"; \
fi; \
echo " "

check-env-all: check-env-dev check-env-cpu check-env-gpu

# ------------------------------------------------------
# Printing out its git version hash

GIT_VERSION := $(shell ./version_string.sh)

CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"
TOOL_CFLAGS+=-DAD_VERSION=\"$(GIT_VERSION)\"

# ------------------------------------------------------

link-code:
ln -sf performdocking.h.dpcpp $(HOST_INC_DIR)/performdocking.h
ln -sf performdocking.cpp.dpcpp $(HOST_SRC_DIR)/performdocking.cpp

unlink-code:
rm -f $(HOST_INC_DIR)/performdocking.h $(HOST_SRC_DIR)/performdocking.cpp

kernels: $(KERNEL_SRC)
$(DPCPP) $(NWI) $(REP) $(DPCPP_FLAGS) $(IFLAGS) $(OPT) $(DPCPP_INCLUDES) -c $(KRNL_DIR)/kernels.dp.cpp

otool: unlink-code
@echo "Building" $(TOOL_TARGET) "..."
$(CPP) \
$(shell ls $(HOST_SRC_DIR)/*.cpp) \
$(TOOL_CFLAGS) \
-o$(BIN_DIR)/$(TOOL_TARGET) \
$(PIPELINE) $(OPT) -DTOOLMODE $(REP)

odock: check-env-all kernels link-code
@echo "Building" $(TARGET) "..."
$(CPP) \
$(shell ls $(HOST_SRC_DIR)/*.cpp) \
$(CFLAGS) \
$(LIB_XeGPU) \
-o$(BIN_DIR)/$(TARGET) \
$(DEV) $(NWI) $(PIPELINE) $(OPT) $(DD) $(REP) $(KFLAGS)

# Example
# 1ac8: for testing gradients of translation and rotation genes
# 7cpa: for testing gradients of torsion genes (15 torsions)
# 3tmn: for testing gradients of torsion genes (1 torsion)

PDB := 3ce3
NRUN := 100
NGEN := 27000
POPSIZE := 150
TESTNAME := test
TESTLS := sw

test: odock
$(BIN_DIR)/$(TARGET) \
-ffile ./input/$(PDB)/derived/$(PDB)_protein.maps.fld \
-lfile ./input/$(PDB)/derived/$(PDB)_ligand.pdbqt \
-nrun $(NRUN) \
-ngen $(NGEN) \
-psize $(POPSIZE) \
-resnam $(TESTNAME) \
-gfpop 0 \
-lsmet $(TESTLS)

ASTEX_PDB := 2bsm
ASTEX_NRUN:= 10
ASTEX_POPSIZE := 10
ASTEX_TESTNAME := test_astex
ASTEX_LS := sw

astex: odock
$(BIN_DIR)/$(TARGET) \
-ffile ./input_tsri/search-set-astex/$(ASTEX_PDB)/protein.maps.fld \
-lfile ./input_tsri/search-set-astex/$(ASTEX_PDB)/flex-xray.pdbqt \
-nrun $(ASTEX_NRUN) \
-psize $(ASTEX_POPSIZE) \
-resnam $(ASTEX_TESTNAME) \
-gfpop 1 \
-lsmet $(ASTEX_LS)

# $(BIN_DIR)/$(TARGET) -ffile ./input_tsri/search-set-astex/$(ASTEX_PDB)/protein.maps.fld -lfile ./input_tsri/search-set-astex/$(ASTEX_PDB)/flex-xray.pdbqt -nrun $(ASTEX_NRUN) -psize $(ASTEX_POPSIZE) -resnam $(ASTEX_TESTNAME) -gfpop 1 | tee ./input_tsri/search-set-astex/intrapairs/$(ASTEX_PDB)_intrapair.txt

clean:
rm -f $(HOST_INC_DIR)/performdocking.h $(HOST_SRC_DIR)/performdocking.cpp
Loading