zama-ai · IceTDrinker · Sep 30, 2024 · Jul 26, 2024
diff --git a/tfhe/docs/SUMMARY.md b/tfhe/docs/SUMMARY.md
@@ -8,7 +8,10 @@
 * [Installation](getting\_started/installation.md)
 * [Quick start](getting\_started/quick\_start.md)
 * [Types & Operations](getting\_started/operations.md)
-* [Benchmarks](getting\_started/benchmarks.md)
+* [Benchmarks](getting\_started/benchmarks/summary.md)
+  * [CPU Benchmarks](getting\_started/benchmarks/cpu\_benchmarks.md)
+  * [GPU Benchmarks](getting\_started/benchmarks/gpu\_benchmarks.md)
+  * [Zero-knowledge proof benchmarks](getting_started/benchmarks/zk_proof_benchmarks.md)
 * [Security and cryptography](getting\_started/security\_and\_cryptography.md)
 
 ## Fundamentals

diff --git a/tfhe/docs/getting_started/benchmarks.md b/tfhe/docs/getting_started/benchmarks.md
diff --git a/tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md b/tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md
@@ -0,0 +1,52 @@
+# CPU Benchmarks
+
+This document details the CPU performance benchmarks of homomorphic operations using **TFHE-rs**.
+
+By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
+
+{% hint style="info" %}
+All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped with an `AMD EPYC 9R14 CPU @ 2.60GHz` and 740GB of RAM.
+{% endhint %}
+
+## Integer operations
+
+The following tables benchmark the execution time of some operation sets using `FheUint` (unsigned integers). The `FheInt` (signed integers) performs similarly.
+
+The next table shows the operation timings on CPU when all inputs are encrypted
+
+{% embed url="https://docs.google.com/spreadsheets/d/1Z2NZvWEkDnbHPYE4Su0Oh2Zz1VBnT9dWbo3E29-LcDg/edit?usp=sharing" %}
+
+The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1NGPnuBhRasES9Ghaij4ixJJTpXVMqDzbqMniX-qIMGc/edit?usp=sharing" %}
+
+All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters `PARAM_MESSAGE_2_CARRY_2_KS_PBS`. To ensure predictable timings, we perform operations in the `default` mode, which ensures that the input and output encoding are similar (i.e., the carries are always emptied).
+
+You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes from [the fine-grained APIs](../../references/fine-grained-apis/quick\_start.md), each balancing performance and correctness differently. For more details about parameters, see [here](../../references/fine-grained-apis/shortint/parameters.md). You can find the benchmark results on GPU for all these operations [here](../../guides/run\_on\_gpu.md#benchmarks).
+
+## Programmable bootstrapping
+
+The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given. The configuration is Concrete FFT + AVX-512.
+
+{% embed url="https://docs.google.com/spreadsheets/d/1OdZrsk0dHTWSLLvstkpiv0u5G5tE0mCqItTb7WixGdg/edit?usp=sharing" %}
+
+## Reproducing TFHE-rs benchmarks
+
+**TFHE-rs** benchmarks can be easily reproduced from the [source](https://github.com/zama-ai/tfhe-rs).
+
+{% hint style="info" %}
+AVX512 is now enabled by default for benchmarks when available
+{% endhint %}
+
+The following example shows how to reproduce **TFHE-rs** benchmarks:
+
+```shell
+#Boolean benchmarks:
+make bench_boolean
+
+#Integer benchmarks:
+make bench_integer
+
+#Shortint benchmarks:
+make bench_shortint
+```
diff --git a/tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md b/tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md
@@ -0,0 +1,27 @@
+# GPU Benchmarks
+
+This document details the GPU performance benchmarks of homomorphic operations using **TFHE-rs**.
+
+All GPU benchmarks presented here were obtained on H100 GPUs, and rely on the multithreaded PBS algorithm. The cryptographic parameters `PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS` were used.
+
+## 1xH100
+Below come the results for the execution on a single H100.
+The following table shows the performance when the inputs of the benchmarked operation are encrypted:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1dhNYXm7oY0l2qjX3dNpSZKjIBJElkEZtPDIWHZ4FA_A/edit?usp=sharing" %}
+
+The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1wtnFnOwHrSOvfTWluUEaDoTULyveseVl1ZsYo3AOFKk/edit?usp=sharing" %}
+
+## 2xH100
+
+Below come the results for the execution on two H100's.
+The following table shows the performance when the inputs of the benchmarked operation are encrypted:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1_2AUeu3ua8_PXxMfeJCh-pp6b9e529PGVEYUuZRAThg/edit?usp=sharing" %}
+
+
+The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1nLPt_m1MbkSdhMop0iKDnSN_c605l_JdMpK5JC90N_Q/edit?usp=sharing" %}
diff --git a/tfhe/docs/getting_started/benchmarks/summary.md b/tfhe/docs/getting_started/benchmarks/summary.md
@@ -0,0 +1,7 @@
+# Benchmarks
+
+This document summarizes the timings of some homomorphic operations over 64-bit encrypted integers, depending on the hardware. More details are given for [the CPU](cpu\_benchmarks.md), [the GPU](gpu\_benchmarks.md), or [zeros-knowledge proofs](zk\_proof\_benchmarks.md).
+
+### Operation time (ms) over FheUint 64
+
+{% embed url="https://docs.google.com/spreadsheets/d/1ZbgsKnFH8eKrFjy9khFeaLYnUhbSV8Xu4H6rwulo0o8/edit?usp=sharing" %}
diff --git a/tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md b/tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md
@@ -0,0 +1,7 @@
+# Zero-knowledge proof benchmarks
+
+This document details the performance benchmarks of [zero-knowledge proofs](../../guides/zk-pok.md) for [compact public key encryption](../../guides/public_key.md) using **TFHE-rs**.
+
+Benchmarks for the zero-knowledge proofs have been run on a `m6i.4xlarge` with 16 cores to simulate an usual client configuration.  The verification are done on a `hpc7a.96xlarge` AWS instances to mimic a powerful server. 
+
+{% embed url="https://docs.google.com/spreadsheets/d/1llCYHCz2CyLdTwXkiqhjVzJLzxW_RqdjHxmk72m1jm4/edit?usp=sharing" %}