Skip to content

Commit

Permalink
Release v0.2 (#2)
Browse files Browse the repository at this point in the history
* Enable Clang/LLVM 18 support (#1)
* Added support for 
	* cudaMemcpy2D and async variant, 
    * cudaMemset2DAsync and async variant, 
	* cudaStreamCreateWithPriority and cudaEventCreateWithFlags
* Refactored test setup, static pass and runtime distinction
* TypeART as optional dependency, works with LLVM 14
---------
Co-authored-by: Tim Ziegler <timziegler1604@gmail.com>
  • Loading branch information
ahueck authored Nov 1, 2024
1 parent adc7487 commit 18fda7c
Show file tree
Hide file tree
Showing 120 changed files with 6,215 additions and 1,785 deletions.
94 changes: 63 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,103 @@
# CuSan &middot; [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

CuSan is tool to find data races between (asynchronous) CUDA calls and the host.
To that end, we analyze and instrument CUDA codes to track CUDA domain-specific memory accesses and synchronization semantics during compilation using LLVM.
Our runtime then passes these information appropriately to [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) (packaged with Clang/LLVM) for the final data race analysis.

To that end, during compilation with Clang/LLVM, we analyze and instrument CUDA API usage in the target code to track CUDA-specific memory accesses and synchronization semantics.
Our runtime then exposes these information to [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) (packaged with Clang/LLVM) for the final data race analysis.


## Usage

Making use of CuSan consists of two phases:

1. Compile your code with Clang/LLVM (version 14) using one the CuSan compiler wrappers, e.g., `cusan-clang++` or `cusan-mpic++`.
This will (a) analyze and instrument the CUDA API appropriately, such as kernel calls and their particular memory access semantics (r/w), (b) add ThreadSanitizer instrumentation, and (c) finally link our runtime library.
1. Compile your code using one the CuSan compiler wrappers, e.g., `cusan-clang++` or `cusan-mpic++`.
This will (a) analyze and instrument the CUDA API, such as kernel calls and their particular memory access semantics (r/w), (b) add ThreadSanitizer instrumentation automatically (`-fsanitize=thread`), and (c) finally link our runtime library.
2. Execute the target program for the data race analysis. Our runtime internally calls ThreadSanitizer to expose the CUDA synchronization and memory access semantics.

#### Example usage
Given the file [02_event.c](test/runtime/02_event.c), execute the following for CUDA data race detection:

```bash
$ cusan-clang -O3 -g 02_event.c -x cuda -gencode arch=compute_70,code=sm_70 -o event.exe
$ export TSAN_OPTIONS=ignore_noninstrumented_modules=1
$ ./event.exe
```

### Checking CUDA-aware MPI applications
You need to use the MPI correctness checker [MUST](https://hpc.rwth-aachen.de/must/), or preload our (very) simple MPI interceptor `libCusanMPIInterceptor.so` for CUDA-aware MPI data race detection.
These libraries call ThreadSanitizer with the particular access semantics of MPI.
Therefore, the combined semantics of CUDA and MPI are properly exposed to ThreadSanitizer to detect data races of data dependent MPI and CUDA calls.

#### Example usage for MPI
Given the file [03_cuda_to_mpi.c](test/runtime/03_cuda_to_mpi.c), execute the following for CUDA data race detection:

```bash
$ cusan-mpic++ -O3 -g 03_cuda_to_mpi.c -x cuda -gencode arch=compute_70,code=sm_70 -o cuda_to_mpi.exe
$ LD_PRELOAD=/path/to/libCusanMPIInterceptor.so mpirun -n 2 ./cuda_to_mpi.exe
```

*Note*: For avoiding false positives, ThreadSanitizer suppression files might be needed, see for example [suppression.txt](test/runtime/suppressions.txt), or documentation for [sanitizer special case lists](https://clang.llvm.org/docs/SanitizerSpecialCaseList.html).

#### Example report
The following is an example report for [03_cuda_to_mpi.c](test/pass/03_cuda_to_mpi.c) of our test suite, where the necessary synchronization is not called:
The following is an example report for [03_cuda_to_mpi.c](test/runtime/03_cuda_to_mpi.c) of our test suite, where the necessary synchronization is not called:
```c
L.23 __global__ void kernel(int* arr, const int N)
L.18 __global__ void kernel(int* arr, const int N)
...
L.58 int* d_data;
L.59 cudaMalloc(&d_data, size * sizeof(int));
L.60
L.61 if (world_rank == 0) {
L.62 kernel<<<blocksPerGrid, threadsPerBlock>>>(d_data, size);
L.63 #ifdef CUSAN_SYNC
L.64 cudaDeviceSynchronize(); // CUSAN_SYNC needs to be defined
L.65 #endif
L.66 MPI_Send(d_data, size, MPI_INT, 1, 0, MPI_COMM_WORLD);
L.53 int* d_data;
L.54 cudaMalloc(&d_data, size * sizeof(int));
L.55
L.56 if (world_rank == 0) {
L.57 kernel<<<blocksPerGrid, threadsPerBlock>>>(d_data, size);
L.58 #ifdef CUSAN_SYNC
L.59 cudaDeviceSynchronize(); // CUSAN_SYNC needs to be defined
L.60 #endif
L.61 MPI_Send(d_data, size, MPI_INT, 1, 0, MPI_COMM_WORLD);
```
```
==================
WARNING: ThreadSanitizer: data race (pid=689288)
Read of size 8 at 0x7fb51f200000 by main thread:
#0 main cusan/test/pass/03_cuda_to_mpi.c:66:5 (03_cuda_to_mpi.c.exe+0x4e8448)
WARNING: ThreadSanitizer: data race (pid=579145)
Read of size 8 at 0x7f1587200000 by main thread:
#0 main cusan/test/runtime/03_cuda_to_mpi.c:61:5 (03_cuda_to_mpi.c.exe+0xfad11)

Previous write of size 8 at 0x7fb51f200000 by thread T6:
#0 __device_stub__kernel(int*, int) cusan/test/pass/03_cuda_to_mpi.c:23:47 (03_cuda_to_mpi.c.exe+0x4e81ef)
Previous write of size 8 at 0x7f1587200000 by thread T6:
#0 __device_stub__kernel(int*, int) cusan/test/runtime/03_cuda_to_mpi.c:18:47 (03_cuda_to_mpi.c.exe+0xfaaed)

Thread T6 'cuda_stream' (tid=0, running) created by main thread at:
#0 __pool_create_fiber_dbg cusan/build/_deps/fiber_pool-src/fiberpool.cpp:538:16 (libCusanFiberpool-d.so+0x1c152)
#1 main cusan/test/pass/03_cuda_to_mpi.c:59:3 (03_cuda_to_mpi.c.exe+0x4e8331)
Thread T6 'cuda_stream 0' (tid=0, running) created by main thread at:
#0 cusan::runtime::Runtime::register_stream(cusan::runtime::Stream) <null> (libCusanRuntime.so+0x3b830)
#1 main cusan/test/runtime/03_cuda_to_mpi.c:54:3 (03_cuda_to_mpi.c.exe+0xfabc7)

SUMMARY: ThreadSanitizer: data race cusan/test/pass/03_cuda_to_mpi.c:66:5 in main
SUMMARY: ThreadSanitizer: data race cusan/test/runtime/03_cuda_to_mpi.c:61:5 in main
==================
ThreadSanitizer: reported 1 warnings
```
## Building cusan
cusan requires LLVM version 14 and CMake version >= 3.20. Use CMake presets `develop` or `release`
#### Caveats ThreadSanitizer and OpenMPI
Known issues (on the Lichtenberg HPC system) to make ThreadSanitizer work with OpenMPI 4.1.6:
- Intel Compute Runtime requires environment flags to work with sanitizers, see [Intel Compute Runtime issue 376](https://github.com/intel/compute-runtime/issues/376):
```bash
export NEOReadDebugKeys=1
export DisableDeepBind=1
```
- The sanitizer memory interceptor does not play well with OpenMPI's, see [OpenMPI issue 12819](https://github.com/open-mpi/ompi/issues/12819). Need to disable *patcher*:
```bash
export OMPI_MCA_memory=^patcher
```

## Building CuSan

CuSan is tested with LLVM version 14 and 18, and CMake version >= 3.20. Use CMake presets `develop` or `release`
to build.

### Dependencies
CuSan was tested with:
- System modules: `1) gcc/11.2.0 2) cuda/11.8 3) openmpi/4.1.6 4) git/2.40.0 5) python/3.10.10 6) clang/14.0.6`
- External libraries: TypeART (https://github.com/tudasc/TypeART/tree/feat/cuda), FiberPool (optional, default off)
CuSan was tested on the TUDa Lichtenberg II cluster with:
- System modules: `1) gcc/11.2.0 2) cuda/11.8 3) openmpi/4.1.6 4) git/2.40.0 5) python/3.10.10 6) clang/14.0.6 or 6) clang/18.1.8`
- Optional external libraries: [TypeART](https://github.com/tudasc/TypeART/tree/v1.9.0b-cuda.1), FiberPool (both default off)
- Testing: llvm-lit, FileCheck
- GPU: Tesla T4 and Tesla V100 (mostly: arch=sm_70)

### Build example

cusan uses CMake to build. Example build recipe (release build, installs to default prefix
CuSan uses CMake to build. Example build recipe (release build, installs to default prefix
`${cusan_SOURCE_DIR}/install/cusan`)

```sh
Expand All @@ -80,7 +110,9 @@ $> cmake --build build --target install --parallel

| Option | Default | Description |
|------------------------------|:-------:|---------------------------------------------------------------------------------------------------|
| `CUSAN_TYPEART` | `OFF` | Use TypeART library to track memory allocations. |
| `CUSAN_FIBERPOOL` | `OFF` | Use external library to efficiently manage fibers creation . |
| `CUSAN_SOFTCOUNTER` | `OFF` | Runtime stats for calls to ThreadSanitizer and CUDA-callbacks. Only use for stats collection, not race detection. |
| `CUSAN_SYNC_DETAIL_LEVEL` | `ON` | Analyze, e.g., memcpy and memcpyasync w.r.t. arguments to determine implicit sync. |
| `CUSAN_LOG_LEVEL_RT` | `3` | Granularity of runtime logger. 3 is most verbose, 0 is least. For release, set to 0. |
| `CUSAN_LOG_LEVEL_PASS` | `3` | Granularity of pass plugin logger. 3 is most verbose, 0 is least. For release, set to 0. |
12 changes: 3 additions & 9 deletions cmake/cusanToolchain.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -28,21 +28,15 @@ string(COMPARE EQUAL "${CMAKE_SOURCE_DIR}" "${PROJECT_SOURCE_DIR}"
find_package(CUDAToolkit REQUIRED)
find_package(MPI REQUIRED)

FetchContent_Declare(
typeart
GIT_REPOSITORY https://github.com/tudasc/TypeART.git
GIT_TAG v1.9.0b-cuda.1
GIT_SHALLOW 1
)
FetchContent_MakeAvailable(typeart)

option(CUSAN_TEST_CONFIGURE_IDE "Add targets for tests to help the IDE with completion etc." ON)
mark_as_advanced(CUSAN_TEST_CONFIGURE_IDE)
option(CUSAN_CONFIG_DIR_IS_SHARE "Install to \"share/cmake/\" instead of \"lib/cmake/\"" OFF)
mark_as_advanced(CUSAN_CONFIG_DIR_IS_SHARE)

set(CUSAN_LOG_LEVEL_RT 3 CACHE STRING "Granularity of runtime logger. 3 is most verbose, 0 is least.")
set(CUSAN_LOG_LEVEL_PASS 3 CACHE STRING "Granularity of transform pass logger. 3 is most verbose, 0 is least.")

option(CUSAN_TYPEART "Use external typeart to track allocations" OFF)
option(CUSAN_FIBERPOOL "Use external fiber pool to manage ThreadSanitizer fibers" OFF)
option(CUSAN_SOFTCOUNTER "Print runtime counters" OFF)
option(CUSAN_SYNC_DETAIL_LEVEL "Enable implicit sync analysis of memcpy/memset" ON)
Expand Down Expand Up @@ -73,7 +67,7 @@ include(modules/cusan-format)
include(modules/cusan-target-util)

cusan_find_llvm_progs(CUSAN_CLANG_EXEC "clang-${LLVM_VERSION_MAJOR};clang" DEFAULT_EXE "clang")
cusan_find_llvm_progs(CUSAN_CLANGCXX_EXEC "clang-${LLVM_VERSION_MAJOR};clang++" DEFAULT_EXE "clang++")
cusan_find_llvm_progs(CUSAN_CLANGCXX_EXEC "clang++-${LLVM_VERSION_MAJOR};clang++" DEFAULT_EXE "clang++")
cusan_find_llvm_progs(CUSAN_LLC_EXEC "llc-${LLVM_VERSION_MAJOR};llc" DEFAULT_EXE "llc")
cusan_find_llvm_progs(CUSAN_OPT_EXEC "opt-${LLVM_VERSION_MAJOR};opt" DEFAULT_EXE "opt")

Expand Down
2 changes: 1 addition & 1 deletion cmake/modules/cusan-llvm.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ function(cusan_llvm_module name sources)
)

if(ARG_INCLUDE_DIRS)
target_include_directories(${name}
target_include_directories(${name} ${warning_guard}
PRIVATE
${ARG_INCLUDE_DIRS}
)
Expand Down
12 changes: 11 additions & 1 deletion externals/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,14 @@ if(CUSAN_FIBERPOOL)
FETCHCONTENT_SOURCE_DIR_FIBER_POOL
FETCHCONTENT_UPDATES_DISCONNECTED_FIBER_POOL
)
endif()
endif()

if(CUSAN_TYPEART)
FetchContent_Declare(
typeart
GIT_REPOSITORY https://github.com/tudasc/TypeART.git
GIT_TAG v1.9.0b-cuda.1
GIT_SHALLOW 1
)
FetchContent_MakeAvailable(typeart)
endif()
2 changes: 2 additions & 0 deletions lib/analysis/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ target_include_directories(cusan_Analysis ${warning_guard}

cusan_target_define_file_basename(cusan_Analysis)

target_compile_definitions(cusan_Analysis PRIVATE "LLVM_VERSION_MAJOR=${LLVM_VERSION_MAJOR}")

set(CONFIG_NAME cusanAnalysis)
set(TARGETS_EXPORT_NAME ${CONFIG_NAME}Targets)

Expand Down
Loading

0 comments on commit 18fda7c

Please sign in to comment.