diff --git a/getting_started/CUDAGuide/index.html b/getting_started/CUDAGuide/index.html index eb27990..766396c 100644 --- a/getting_started/CUDAGuide/index.html +++ b/getting_started/CUDAGuide/index.html @@ -31,7 +31,7 @@ printf("%f %f\n", d_x, d_y); } -
A one-liner compilation of the above using Enzyme:
clang test2.cpp -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2
+
A one-liner compilation of the above using Enzyme:
clang test2.cpp -fplugin=/path/to/ClangEnzyme-11.so -O2
When porting the above code, there are some caveats to be aware of:
__enzyme_autodiff
should only be invoked on ___device___
code, not __global__
kernel code. __global__
kernels may be supported in the future.--cuda-gpu-arch=sm_xx
is usually needed as the default sm_20
is unsupported by modern CUDA versions.#include <stdio.h>
void __device__ foo_impl(double* x_in, double *x_out) {
@@ -94,7 +94,7 @@
printf("%f %f\n", host_d_x, host_d_y);
}
-
For convenience, a one-liner compilation step is (against sm_70):
clang test3.cu -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch=sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
+
For convenience, a one-liner compilation step is (against sm_70):
clang test3.cu -fplugin=/path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch=sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM’s optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may impact performance . If there is a performance issue that you suspect may be due to optimization ordering, please diff --git a/getting_started/Faq/index.html b/getting_started/Faq/index.html index cfc716d..e32ec6c 100644 --- a/getting_started/Faq/index.html +++ b/getting_started/Faq/index.html @@ -9,10 +9,6 @@
opt reports a error:opt: unknown pass name 'enzyme'
May be because you are using LLVM 13 or a higher version, which has switched to the new pass manager pipeline. On applicable LLVM versions (up to and including LLVM 15),
you can specify that opt useees the old pass manager by adding the --enable-new-pm=0
flag. Alternatively, you can use the new pass manager, which uses the following syntax
for opt:
opt input.ll -load=/path/to/LLVMEnzyme-VERSION.so -passes=enzyme -o output.ll -S
-
Simiarly, clang has different syntax for plugins for the new pass manager. On LLVM versions up to and including LLVM 12, this is done as follows:
clang -Xclang -load -Xclang /path/to/ClangEnzyme-VERSION.so code.c
-
On LLVM 13 and above, the new pass manager is the default. If you would like to continue to use the old pass manager, you will also need to provide
-either -fno-experimental-new-pass-manager
on versions <= 14, or -flegacy-pass-manager
on LLVM 15. LLVM 16 and above dropped support for the legacy pass
-manager entirely.
To use the new pass manager, the following command line args are required for Clang:
clang -fpass-plugin=/path/to/ClangEnzyme-VERSION.so code.c
If you are using CMake, Enzyme exports a special ClangEnzymeFlags
target which will automatically add the correct flags. See
here
for an example.
Until June 2020, LLVM’s exisitng GVN pass had a bug handling invariant.load’s that would cause it to crash. These tend to be generated a lot by Enzyme for better optimization. This was reported diff --git a/getting_started/UsingEnzyme/index.html b/getting_started/UsingEnzyme/index.html index f2755c7..2a87986 100644 --- a/getting_started/UsingEnzyme/index.html +++ b/getting_started/UsingEnzyme/index.html @@ -73,10 +73,10 @@ }
The generated gradient has been inlined and entirely simplified to return the input times two.
We can then compile this to a final binary as follows:
clang output_opt.ll -o a.exe
For ease, we could combine the final optimization and bianry execution into one command as follows.
clang output.ll -O3 -o a.exe
-
Moreover, using Enzyme’s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.
clang test.c -Xclang -load -Xclang /path/to/Enzyme/enzyme/build/Enzyme/LLVMEnzyme-<VERSION>.so -o a.exe
-
Note that each version of Clang/LLVM will have slightly different command line flags to specifying plugins. See +
Moreover, using Enzyme’s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.
clang test.c -fplugin=/path/to/Enzyme/enzyme/build/Enzyme/ClangEnzyme-<VERSION>.so -o a.exe
+
Note that if using the LLVM plugin, each version of LLVM will have slightly different command line flags to specifying plugins. See the FAQ -for more information.
Enzyme has several advanced options that may be of interest.
The enzyme-preopt
option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.
$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=1
+for more information. If using the clang plugin, the same command should work independently of version.Advanced options ¶
Enzyme has several advanced options that may be of interest.
Performance options ¶
Disabling Preprocessing ¶
The enzyme-preopt
option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.
$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=1
$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt=0
Forced Inlining ¶
The enzyme-inline
option forcibly inlines all subfunction calls. The enzyme-inline-count
option limits the number of calls inlined by this utility.
$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline=1
$ opt input.ll -load=./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline=1 -enzyme-inline-count=100