diff --git a/getting_started/CUDAGuide/index.html b/getting_started/CUDAGuide/index.html index eb27990..766396c 100644 --- a/getting_started/CUDAGuide/index.html +++ b/getting_started/CUDAGuide/index.html @@ -31,7 +31,7 @@ printf("%f %f\n", d_x, d_y); } -

A one-liner compilation of the above using Enzyme:

clang test2.cpp -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2 
+

A one-liner compilation of the above using Enzyme:

clang test2.cpp -fplugin=/path/to/ClangEnzyme-11.so -O2 
 

CUDA Example 

When porting the above code, there are some caveats to be aware of:

  1. CUDA 10.1 is the latest supported CUDA at the time of writing (Jan/20/2021) for LLVM 11.
  2. __enzyme_autodiff should only be invoked on ___device___ code, not __global__ kernel code. __global__ kernels may be supported in the future.
  3. --cuda-gpu-arch=sm_xx is usually needed as the default sm_20 is unsupported by modern CUDA versions.
#include <stdio.h>
 
 void __device__ foo_impl(double* x_in, double *x_out) {
@@ -94,7 +94,7 @@
     printf("%f %f\n", host_d_x, host_d_y);
 
 }
-

For convenience, a one-liner compilation step is (against sm_70):

clang test3.cu -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch=sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
+

For convenience, a one-liner compilation step is (against sm_70):

clang test3.cu -fplugin=/path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch=sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
 

Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM’s optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may impact performance . If there is a performance issue that you suspect may be due to optimization ordering, please diff --git a/getting_started/Faq/index.html b/getting_started/Faq/index.html index cfc716d..e32ec6c 100644 --- a/getting_started/Faq/index.html +++ b/getting_started/Faq/index.html @@ -9,10 +9,6 @@

opt reports a error:opt: unknown pass name 'enzyme'

May be because you are using LLVM 13 or a higher version, which has switched to the new pass manager pipeline. On applicable LLVM versions (up to and including LLVM 15), you can specify that opt useees the old pass manager by adding the --enable-new-pm=0 flag. Alternatively, you can use the new pass manager, which uses the following syntax for opt:

opt input.ll -load=/path/to/LLVMEnzyme-VERSION.so -passes=enzyme -o output.ll -S
-

Simiarly, clang has different syntax for plugins for the new pass manager. On LLVM versions up to and including LLVM 12, this is done as follows:

clang -Xclang -load -Xclang /path/to/ClangEnzyme-VERSION.so code.c
-

On LLVM 13 and above, the new pass manager is the default. If you would like to continue to use the old pass manager, you will also need to provide -either -fno-experimental-new-pass-manager on versions <= 14, or -flegacy-pass-manager on LLVM 15. LLVM 16 and above dropped support for the legacy pass -manager entirely.

To use the new pass manager, the following command line args are required for Clang:

clang -fpass-plugin=/path/to/ClangEnzyme-VERSION.so code.c
 

If you are using CMake, Enzyme exports a special ClangEnzymeFlags target which will automatically add the correct flags. See here for an example.

UNREACHABLE executed (GVN error) 

Until June 2020, LLVM’s exisitng GVN pass had a bug handling invariant.load’s that would cause it to crash. These tend to be generated a lot by Enzyme for better optimization. This was reported diff --git a/getting_started/UsingEnzyme/index.html b/getting_started/UsingEnzyme/index.html index f2755c7..2a87986 100644 --- a/getting_started/UsingEnzyme/index.html +++ b/getting_started/UsingEnzyme/index.html @@ -73,10 +73,10 @@ }

The generated gradient has been inlined and entirely simplified to return the input times two.

We can then compile this to a final binary as follows:

clang output_opt.ll -o a.exe
 

For ease, we could combine the final optimization and bianry execution into one command as follows.

clang output.ll -O3 -o a.exe
-

Moreover, using Enzyme’s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.

clang test.c -Xclang -load -Xclang /path/to/Enzyme/enzyme/build/Enzyme/LLVMEnzyme-<VERSION>.so -o a.exe
-

Note that each version of Clang/LLVM will have slightly different command line flags to specifying plugins. See +

Moreover, using Enzyme’s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.

clang test.c -fplugin=/path/to/Enzyme/enzyme/build/Enzyme/ClangEnzyme-<VERSION>.so -o a.exe
+

Note that if using the LLVM plugin, each version of LLVM will have slightly different command line flags to specifying plugins. See the FAQ -for more information.

Advanced options 

Enzyme has several advanced options that may be of interest.

Performance options 

Disabling Preprocessing 

The enzyme-preopt option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.

$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=1
+for more information. If using the clang plugin, the same command should work independently of version.

Advanced options 

Enzyme has several advanced options that may be of interest.

Performance options 

Disabling Preprocessing 

The enzyme-preopt option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.

$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=1
 $ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=0
 

Forced Inlining 

The enzyme-inline option forcibly inlines all subfunction calls. The enzyme-inline-count option limits the number of calls inlined by this utility.

$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline=1
 $ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline=1 -enzyme-inline-count=100