Skip to content

Commit

Permalink
chore(docs): benchmark regrouping and visualization
Browse files Browse the repository at this point in the history
  • Loading branch information
yuxizama authored and IceTDrinker committed Sep 26, 2024
1 parent 41fae73 commit 7ec22b3
Show file tree
Hide file tree
Showing 9 changed files with 103 additions and 212 deletions.
5 changes: 4 additions & 1 deletion tfhe/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
* [Installation](getting\_started/installation.md)
* [Quick start](getting\_started/quick\_start.md)
* [Types & Operations](getting\_started/operations.md)
* [Benchmarks](getting\_started/benchmarks.md)
* [Benchmarks](getting\_started/benchmarks/summary.md)
* [CPU Benchmarks](getting\_started/benchmarks/cpu\_benchmarks.md)
* [GPU Benchmarks](getting\_started/benchmarks/gpu\_benchmarks.md)
* [Zero-knowledge proof benchmarks](getting_started/benchmarks/zk_proof_benchmarks.md)
* [Security and cryptography](getting\_started/security\_and\_cryptography.md)

## Fundamentals
Expand Down
126 changes: 0 additions & 126 deletions tfhe/docs/getting_started/benchmarks.md

This file was deleted.

52 changes: 52 additions & 0 deletions tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# CPU Benchmarks

This document details the CPU performance benchmarks of homomorphic operations using **TFHE-rs**.

By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.

{% hint style="info" %}
All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped with an `AMD EPYC 9R14 CPU @ 2.60GHz` and 740GB of RAM.
{% endhint %}

## Integer operations

The following tables benchmark the execution time of some operation sets using `FheUint` (unsigned integers). The `FheInt` (signed integers) performs similarly.

The next table shows the operation timings on CPU when all inputs are encrypted

{% embed url="https://docs.google.com/spreadsheets/d/1Z2NZvWEkDnbHPYE4Su0Oh2Zz1VBnT9dWbo3E29-LcDg/edit?usp=sharing" %}

The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:

{% embed url="https://docs.google.com/spreadsheets/d/1NGPnuBhRasES9Ghaij4ixJJTpXVMqDzbqMniX-qIMGc/edit?usp=sharing" %}

All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters `PARAM_MESSAGE_2_CARRY_2_KS_PBS`. To ensure predictable timings, we perform operations in the `default` mode, which ensures that the input and output encoding are similar (i.e., the carries are always emptied).

You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes from [the fine-grained APIs](../../references/fine-grained-apis/quick\_start.md), each balancing performance and correctness differently. For more details about parameters, see [here](../../references/fine-grained-apis/shortint/parameters.md). You can find the benchmark results on GPU for all these operations [here](../../guides/run\_on\_gpu.md#benchmarks).

## Programmable bootstrapping

The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given. The configuration is Concrete FFT + AVX-512.

{% embed url="https://docs.google.com/spreadsheets/d/1OdZrsk0dHTWSLLvstkpiv0u5G5tE0mCqItTb7WixGdg/edit?usp=sharing" %}

## Reproducing TFHE-rs benchmarks

**TFHE-rs** benchmarks can be easily reproduced from the [source](https://github.com/zama-ai/tfhe-rs).

{% hint style="info" %}
AVX512 is now enabled by default for benchmarks when available
{% endhint %}

The following example shows how to reproduce **TFHE-rs** benchmarks:

```shell
#Boolean benchmarks:
make bench_boolean

#Integer benchmarks:
make bench_integer

#Shortint benchmarks:
make bench_shortint
```
27 changes: 27 additions & 0 deletions tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# GPU Benchmarks

This document details the GPU performance benchmarks of homomorphic operations using **TFHE-rs**.

All GPU benchmarks presented here were obtained on H100 GPUs, and rely on the multithreaded PBS algorithm. The cryptographic parameters `PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS` were used.

## 1xH100
Below come the results for the execution on a single H100.
The following table shows the performance when the inputs of the benchmarked operation are encrypted:

{% embed url="https://docs.google.com/spreadsheets/d/1dhNYXm7oY0l2qjX3dNpSZKjIBJElkEZtPDIWHZ4FA_A/edit?usp=sharing" %}

The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:

{% embed url="https://docs.google.com/spreadsheets/d/1wtnFnOwHrSOvfTWluUEaDoTULyveseVl1ZsYo3AOFKk/edit?usp=sharing" %}

## 2xH100

Below come the results for the execution on two H100's.
The following table shows the performance when the inputs of the benchmarked operation are encrypted:

{% embed url="https://docs.google.com/spreadsheets/d/1_2AUeu3ua8_PXxMfeJCh-pp6b9e529PGVEYUuZRAThg/edit?usp=sharing" %}


The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:

{% embed url="https://docs.google.com/spreadsheets/d/1nLPt_m1MbkSdhMop0iKDnSN_c605l_JdMpK5JC90N_Q/edit?usp=sharing" %}
7 changes: 7 additions & 0 deletions tfhe/docs/getting_started/benchmarks/summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Benchmarks

This document summarizes the timings of some homomorphic operations over 64-bit encrypted integers, depending on the hardware. More details are given for [the CPU](cpu\_benchmarks.md), [the GPU](gpu\_benchmarks.md), or [zeros-knowledge proofs](zk\_proof\_benchmarks.md).

### Operation time (ms) over FheUint 64

{% embed url="https://docs.google.com/spreadsheets/d/1ZbgsKnFH8eKrFjy9khFeaLYnUhbSV8Xu4H6rwulo0o8/edit?usp=sharing" %}
8 changes: 8 additions & 0 deletions tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Zero-knowledge proof benchmarks

This document details the performance benchmarks of [zero-knowledge proofs](../../guides/zk-pok.md) for [compact public key encryption](../../guides/public_key.md) using **TFHE-rs**.

Benchmarks for the zero-knowledge proofs have been run on a `m6i.4xlarge` with 16 cores to simulate an usual client configuration. The verification are done on a `hpc7a.96xlarge` AWS instances to mimic a powerful server.

{% embed url="https://docs.google.com/spreadsheets/d/1llCYHCz2CyLdTwXkiqhjVzJLzxW_RqdjHxmk72m1jm4/edit?usp=sharing" %}

Loading

0 comments on commit 7ec22b3

Please sign in to comment.