This repository contains a number of examples written in Halide language which work on RISC-V CPU with RVV 0.7.1. Follow the steps below to reproduce the experiments or some of their parts.
NOTE: At this moment project relies on the Ahead-Of-Time (AOT) compilation of Halide kernels. However there are precompiled files for easy start.
Project uses OpenCV as a reference implementation for some algorithms. Also, we benchmark it to compare with Halide implemention. Fetch OpenCV source code by submodules update after clonning the repository.
git submodule update --init
-
Download THead toolchain: Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1-20220906.tar.gz (registration needed)
-
Clone Halide source code (no build required)
git clone --depth 1 https://github.com/halide/Halide
-
Build a project for RISC-V CPU:
export PATH=$HOME/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1/bin/:$PATH cmake \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_TOOLCHAIN_FILE=$HOME/halide_riscv/riscv64-071.toolchain.cmake \ -DHalide_INCLUDE_DIRS=$HOME/Halide/src/runtime \ -S halide_riscv -B build_rv64 cmake --build build_rv64 -j$(nproc --all)
-
Transfer build directory to the RISC-V board and run
test_algo
(accuracy tests) orperf_algo
(performance tests).scp test_algo perf_algo libalgos.so opencv-prefix/src/opencv-build/lib/* sipeed@x.x.x.x:/home/sipeed/
export LD_LIBRARY_PATH=./ ./test_algo ./perf_algo
HW: Sipeed Lichee RV Dock (Allwinner D1 aka XuanTie C906 CPU)
OS: 20211230_LicheeRV_debian_d1_hdmi_8723ds
Algorithm | Input | OpenCV (no RVV) | OpenCV (RVV) | Halide (no RVV) | Halide (RVV) |
---|---|---|---|---|---|
BGR2Gray | 1080x1920x3 (interleaved) | 32.18ms | 264.66ms | 34.18ms | 30.78ms |
1080x1920x3 (planar) | -- | -- | 38.13ms | 6.65ms | |
Box filter | input: 1080x1920 output: 1078x1918 |
75.17ms | 139.16ms | 198.02ms | 62.89ms |
Histogram | input: 1080x1920x3 output: 256x3 |
57.35ms | 67.32ms | 92.44ms | -- |
Convolution input: 1x16x128x128 kernel: 32x16x3x3 stride: 1, pad: 0 |
(FP32) NCHW | 829.13ms | 338.02ms | 4713.57ms | 698.27ms |
(FP32) NHWC | -- | -- | 1357.81ms | 418.95ms |
If you want regenerate AOT artifacts or add new algorithms, build the project on x86:
-
Build LLVM from https://github.com/dkurt/llvm-rvv-071/tree/rvv-071
cmake -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_PROJECTS="clang;lld" \ -DLLVM_TARGETS_TO_BUILD="RISCV" \ -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_BUILD_32_BITS=OFF \ -GNinja \ -S llvm-project/llvm -B llvm-build cmake --build llvm-build -j4
-
Build Halide with the following patch (tested on revision https://github.com/halide/Halide/commit/7963cd4e3c23856b82567c99e0a3d16035ffe895):
diff --git a/src/CodeGen_RISCV.cpp b/src/CodeGen_RISCV.cpp index ba9abe04d..454558d11 100644 --- a/src/CodeGen_RISCV.cpp +++ b/src/CodeGen_RISCV.cpp @@ -151,6 +151,7 @@ string CodeGen_RISCV::mattrs() const { arch_flags += ",+zvl" + std::to_string(target.vector_bits) + "b"; } #endif + arch_flags += ",-zve64x"; } return arch_flags; } diff --git a/src/autoschedulers/CMakeLists.txt b/src/autoschedulers/CMakeLists.txt index 9b88f0a66..10088bb9b 100644 --- a/src/autoschedulers/CMakeLists.txt +++ b/src/autoschedulers/CMakeLists.txt @@ -24,6 +24,6 @@ endfunction() add_subdirectory(common) -add_subdirectory(adams2019) +# add_subdirectory(adams2019) add_subdirectory(li2018) add_subdirectory(mullapudi2016)
export LLVM_ROOT=$HOME/llvm-build cmake -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm \ -DClang_DIR=$LLVM_ROOT/lib/cmake/clang \ -DCMAKE_BUILD_TYPE=Release \ -DWITH_TESTS=OFF \ -DWITH_TUTORIALS=OFF \ -DWITH_PYTHON_BINDINGS=OFF \ -S Halide -B halide-build cmake --build halide-build -j4 cmake --install halide-build --prefix halide-install
-
Build a project on x86
export LD_LIBRARY_PATH=$HOME/halide-build/src/autoschedulers/mullapudi2016:$LD_LIBRARY_PATH cmake \ -DCMAKE_BUILD_TYPE=Release \ -DHalide_DIR=$HOME/halide-install/lib/cmake/Halide \ -S halide_riscv -B build cmake --build build -j$(nproc --all)
-
Run
perf_algo
once and find the generated*.h
and*.s
files in the working directory.