Skip to content

Commit

Permalink
deploy: 8f6f0de
Browse files Browse the repository at this point in the history
  • Loading branch information
wsmoses committed Aug 5, 2023
1 parent 20dc1df commit 335c770
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 9 deletions.
4 changes: 2 additions & 2 deletions getting_started/CUDAGuide/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
<span class=n>printf</span><span class=p>(</span><span class=s>&#34;%f %f</span><span class=se>\n</span><span class=s>&#34;</span><span class=p>,</span> <span class=n>d_x</span><span class=p>,</span> <span class=n>d_y</span><span class=p>);</span>

<span class=p>}</span>
</code></pre></div><p>A one-liner compilation of the above using Enzyme:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test2.cpp -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2
</code></pre></div><p>A one-liner compilation of the above using Enzyme:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test2.cpp -fplugin<span class=o>=</span>/path/to/ClangEnzyme-11.so -O2
</code></pre></div><h2 id=cuda-example>CUDA Example&nbsp;<a class=headline-hash href=#cuda-example></a></h2><p>When porting the above code, there are some caveats to be aware of:</p><ol><li>CUDA 10.1 is the latest supported CUDA at the time of writing (Jan/20/2021) for LLVM 11.</li><li><code>__enzyme_autodiff</code> should only be invoked on <code>___device___</code> code, not <code>__global__</code> kernel code. <code>__global__</code> kernels may be supported in the future.</li><li><code>--cuda-gpu-arch=sm_xx</code> is usually needed as the default <code>sm_20</code> is unsupported by modern CUDA versions.</li></ol><div class=highlight><pre class=chroma><code class=language-cpp data-lang=cpp><span class=cp>#include</span> <span class=cpf>&lt;stdio.h&gt;</span><span class=cp>
</span><span class=cp></span>
<span class=kt>void</span> <span class=n>__device__</span> <span class=nf>foo_impl</span><span class=p>(</span><span class=kt>double</span><span class=o>*</span> <span class=n>x_in</span><span class=p>,</span> <span class=kt>double</span> <span class=o>*</span><span class=n>x_out</span><span class=p>)</span> <span class=p>{</span>
Expand Down Expand Up @@ -94,7 +94,7 @@
<span class=n>printf</span><span class=p>(</span><span class=s>&#34;%f %f</span><span class=se>\n</span><span class=s>&#34;</span><span class=p>,</span> <span class=n>host_d_x</span><span class=p>,</span> <span class=n>host_d_y</span><span class=p>);</span>

<span class=p>}</span>
</code></pre></div><p>For convenience, a one-liner compilation step is (against sm_70):</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test3.cu -Xclang -load -Xclang /path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch<span class=o>=</span>sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
</code></pre></div><p>For convenience, a one-liner compilation step is (against sm_70):</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test3.cu -fplugin<span class=o>=</span>/path/to/ClangEnzyme-11.so -O2 --cuda-gpu-arch<span class=o>=</span>sm_70 -lcudart -L/usr/local/cuda-10.1/lib64
</code></pre></div><p>Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM&rsquo;s optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may
<a href=https://proceedings.mlsys.org/paper/2020/file/4e732ced3463d06de0ca9a15b6153677-Paper.pdf>impact performance</a>
. If there is a performance issue that you suspect may be due to optimization ordering, please
Expand Down
4 changes: 0 additions & 4 deletions getting_started/Faq/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@
</code></pre></div><p>opt reports a error:<code>opt: unknown pass name 'enzyme'</code></p><p>May be because you are using LLVM 13 or a higher version, which has switched to the new pass manager pipeline. On applicable LLVM versions (up to and including LLVM 15),
you can specify that opt useees the old pass manager by adding the <code>--enable-new-pm=0</code> flag. Alternatively, you can use the new pass manager, which uses the following syntax
for opt:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>opt input.ll -load<span class=o>=</span>/path/to/LLVMEnzyme-VERSION.so -passes<span class=o>=</span>enzyme -o output.ll -S
</code></pre></div><p>Simiarly, clang has different syntax for plugins for the new pass manager. On LLVM versions up to and including LLVM 12, this is done as follows:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang -Xclang -load -Xclang /path/to/ClangEnzyme-VERSION.so code.c
</code></pre></div><p>On LLVM 13 and above, the new pass manager is the default. If you would like to continue to use the old pass manager, you will also need to provide
either <code>-fno-experimental-new-pass-manager</code> on versions &lt;= 14, or <code>-flegacy-pass-manager</code> on LLVM 15. LLVM 16 and above dropped support for the legacy pass
manager entirely.</p><p>To use the new pass manager, the following command line args are required for Clang:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang -fpass-plugin<span class=o>=</span>/path/to/ClangEnzyme-VERSION.so code.c
</code></pre></div><p>If you are using CMake, Enzyme exports a special <code>ClangEnzymeFlags</code> target which will automatically add the correct flags. See
<a href=https://github.com/EnzymeAD/Enzyme/blob/main/enzyme/test/test_find_package/CMakeLists.txt#L14>here</a>
for an example.</p><h3 id=unreachable-executed-gvn-error>UNREACHABLE executed (GVN error)&nbsp;<a class=headline-hash href=#unreachable-executed-gvn-error></a></h3><p>Until June 2020, LLVM&rsquo;s exisitng GVN pass had a bug handling invariant.load&rsquo;s that would cause it to crash. These tend to be generated a lot by Enzyme for better optimization. This was reported
Expand Down
6 changes: 3 additions & 3 deletions getting_started/UsingEnzyme/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,10 @@
<span class=p>}</span>
</code></pre></div><p>The generated gradient has been inlined and entirely simplified to return the input times two.</p><p>We can then compile this to a final binary as follows:</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang output_opt.ll -o a.exe
</code></pre></div><p>For ease, we could combine the final optimization and bianry execution into one command as follows.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang output.ll -O3 -o a.exe
</code></pre></div><p>Moreover, using Enzyme&rsquo;s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test.c -Xclang -load -Xclang /path/to/Enzyme/enzyme/build/Enzyme/LLVMEnzyme-&lt;VERSION&gt;.so -o a.exe
</code></pre></div><p>Note that each version of Clang/LLVM will have slightly different command line flags to specifying plugins. See
</code></pre></div><p>Moreover, using Enzyme&rsquo;s clang plugin, we could automate the entire AD and compilation in a single command. Using the clang plugin should be done by default as it improves the user experience as well as various default performance options. However, the example above is still useful to understand how Enzyme works on LLVM.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>clang test.c -fplugin<span class=o>=</span>/path/to/Enzyme/enzyme/build/Enzyme/ClangEnzyme-&lt;VERSION&gt;.so -o a.exe
</code></pre></div><p>Note that if using the LLVM plugin, each version of LLVM will have slightly different command line flags to specifying plugins. See
<a href=/getting_started/Faq/>the FAQ</a>
for more information.</p><h2 id=advanced-options>Advanced options&nbsp;<a class=headline-hash href=#advanced-options></a></h2><p>Enzyme has several advanced options that may be of interest.</p><h3 id=performance-options>Performance options&nbsp;<a class=headline-hash href=#performance-options></a></h3><h4 id=disabling-preprocessing>Disabling Preprocessing&nbsp;<a class=headline-hash href=#disabling-preprocessing></a></h4><p>The <code>enzyme-preopt</code> option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>$ opt input.ll -load<span class=o>=</span>./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt<span class=o>=</span><span class=m>1</span>
for more information. If using the clang plugin, the same command should work independently of version.</p><h2 id=advanced-options>Advanced options&nbsp;<a class=headline-hash href=#advanced-options></a></h2><p>Enzyme has several advanced options that may be of interest.</p><h3 id=performance-options>Performance options&nbsp;<a class=headline-hash href=#performance-options></a></h3><h4 id=disabling-preprocessing>Disabling Preprocessing&nbsp;<a class=headline-hash href=#disabling-preprocessing></a></h4><p>The <code>enzyme-preopt</code> option disables the preprocessing optimizations run by the Enzyme pass, except for the absolute minimum neccessary.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>$ opt input.ll -load<span class=o>=</span>./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt<span class=o>=</span><span class=m>1</span>
$ opt input.ll -load<span class=o>=</span>./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-preopt<span class=o>=</span><span class=m>0</span>
</code></pre></div><h4 id=forced-inlining>Forced Inlining&nbsp;<a class=headline-hash href=#forced-inlining></a></h4><p>The <code>enzyme-inline</code> option forcibly inlines all subfunction calls. The <code>enzyme-inline-count</code> option limits the number of calls inlined by this utility.</p><div class=highlight><pre class=chroma><code class=language-sh data-lang=sh>$ opt input.ll -load<span class=o>=</span>./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline<span class=o>=</span><span class=m>1</span>
$ opt input.ll -load<span class=o>=</span>./Enzyme/LLVMEnzyme-7.so -enzyme -enzyme-inline<span class=o>=</span><span class=m>1</span> -enzyme-inline-count<span class=o>=</span><span class=m>100</span>
Expand Down

0 comments on commit 335c770

Please sign in to comment.