-
Notifications
You must be signed in to change notification settings - Fork 155
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8c9ee64
commit 4a00d25
Showing
9 changed files
with
452 additions
and
320 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,97 +1,106 @@ | ||
# Benchmarks | ||
|
||
Due to their nature, homomorphic operations are naturally slower than their clear equivalent. Some timings are exposed for basic operations. For completeness, benchmarks for other libraries are also given. | ||
Due to their nature, homomorphic operations are naturally slower than their cleartext equivalents. Some timings are exposed for basic operations. For completeness, benchmarks for other libraries are also given. | ||
|
||
{% hint style="info" %} | ||
All benchmarks were launched on an AWS m6i.metal with the following specifications: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz and 512GB of RAM. | ||
{% endhint %} | ||
|
||
## Boolean | ||
## Integer | ||
|
||
This measures the execution time of a single binary Boolean gate. | ||
This measures the execution time for some operation sets of tfhe-rs::integer (the unsigned version). Note that the timings for `FheInt` (i.e., the signed integers) are similar. | ||
|
||
### tfhe-rs::boolean. | ||
| Operation \ Size | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` | | ||
|--------------------------------------------------------|------------|-------------|-------------|-------------|--------------|--------------| | ||
| Negation (`-`) | 70.9 ms | 99.3 ms | 129 ms | 180 ms | 239 ms | 333 ms | | ||
| Add / Sub (`+`,`-`) | 70.5 ms | 100 ms | 132 ms | 186 ms | 249 ms | 334 ms | | ||
| Mul (`x`) | 144 ms | 216 ms | 333 ms | 832 ms | 2.50 s | 8.85 s | | ||
| Equal / Not Equal (`eq`, `ne`) | 36.1 ms | 36.5 ms | 57.4 ms | 64.2 ms | 67.3 ms | 78.1 ms | | ||
| Comparisons (`ge`, `gt`, `le`, `lt`) | 52.6 ms | 73.1 ms | 98.8 ms | 124 ms | 165 ms | 201 ms | | ||
| Max / Min (`max`,`min`) | 76.2 ms | 102 ms | 135 ms | 171 ms | 212 ms | 301 ms | | ||
| Bitwise operations (`&`, `\|`, `^`) | 19.4 ms | 20.3 ms | 21.0 ms | 27.2 ms | 31.6 ms | 40.2 ms | | ||
| Div / Rem (`/`, `%`) | 729 ms | 1.93 s | 4.81 s | 12.2 s | 30.7 s | 89.6 s | | ||
| Left / Right Shifts (`<<`, `>>`) | 99.4 ms | 129 ms | 180 ms | 243 ms | 372 ms | 762 ms | | ||
| Left / Right Rotations (`left_rotate`, `right_rotate`) | 103 ms | 128 ms | 182 ms | 241 ms | 374 ms | 763 ms | | ||
|
||
| Parameter set | Concrete FFT | Concrete FFT + AVX-512 | | ||
| --------------------- | ------------ | ---------------------- | | ||
| DEFAULT\_PARAMETERS | 8.8ms | 6.8ms | | ||
| TFHE\_LIB\_PARAMETERS | 13.6ms | 10.9ms | | ||
|
||
### tfhe-lib. | ||
|
||
| Parameter set | fftw | spqlios-fma | | ||
| ------------------------------------------------ | ------ | ----------- | | ||
| default\_128bit\_gate\_bootstrapping\_parameters | 28.9ms | 15.7ms | | ||
All timings are related to parallelized Radix-based integer operations, where each block is encrypted using the default parameters (i.e., PARAM\_MESSAGE\_2\_CARRY\_2\_KS\_PBS, more information about parameters can be found [here](../fine_grained_api/shortint/parameters.md)). | ||
To ensure predictable timings, the operation flavor is the `default` one: the carry is propagated if needed. The operation costs may be reduced by using `unchecked`, `checked`, or `smart`. | ||
|
||
### OpenFHE. | ||
|
||
| Parameter set | GINX | GINX (Intel HEXL) | | ||
| ------------- | ----- | ----------------- | | ||
| STD\_128 | 172ms | 78ms | | ||
| MEDIUM | 113ms | 50.2ms | | ||
## Shortint | ||
|
||
This measures the execution time for some operations using various parameter sets of tfhe-rs::shortint. Except for `unchecked_add`, all timings are related to the `default` operations. This flavor ensures predictable timings for an operation along the entire circuit by clearing the carry space after each operation. | ||
|
||
## Integer | ||
This measures the execution time for some operation sets of tfhe-rs::integer. | ||
This uses the Concrete FFT + AVX-512 configuration. | ||
|
||
| Operation \ Size | `FheUint8` | `FheUint16` | `FheUint32` | ` FheUint64` | `FheUint128` | `FheUint256` | | ||
|--------------------------------------------------------|------------|-------------|-------------|--------------|--------------|--------------| | ||
| Negation (`-`) | 80.4 ms | 106 ms | 132 ms | 193 ms | 257 ms | 348 ms | | ||
| Add / Sub (`+`,`-`) | 81.5 ms | 110 ms | 139 ms | 200 ms | 262 ms | 355 ms | | ||
| Mul (`x`) | 150 ms | 221 ms | 361 ms | 928 ms | 2.90 s | 10.97 s | | ||
| Equal / Not Equal (`eq`, `ne`) | 39.4 ms | 40.2 ms | 61.1 ms | 66.4 ms | 74.5 ms | 85.7 ms | | ||
| Comparisons (`ge`, `gt`, `le`, `lt`) | 57.5 ms | 79.6 ms | 105 ms | 136 ms | 174 ms | 219 ms | | ||
| Max / Min (`max`,`min`) | 100 ms | 130 ms | 163 ms | 204 ms | 245 ms | 338 ms | | ||
| Bitwise operations (`&`, `|`, `^`) | 20.7 ms | 21.1 ms | 22.6 ms | 30.2 ms | 34.1 ms | 42.1 ms | | ||
| Div / Rem (`/`, `%`) | 1.37 s | 3.50 s | 9.12 s | 23.9 s | 59.9 s | 149.2 s | | ||
| Left / Right Shifts (`<<`, `>>`) | 106 ms | 140 ms | 202 ms | 262 ms | 403 ms | 827 ms | | ||
| Left / Right Rotations (`left_rotate`, `right_rotate`) | 105 ms | 140 ms | 199 ms | 263 ms | 403 ms | 829 ms | | ||
| Parameter set | PARAM\_MESSAGE\_1\_CARRY\_1 | PARAM\_MESSAGE\_2\_CARRY\_2 | PARAM\_MESSAGE\_3\_CARRY\_3 | PARAM\_MESSAGE\_4\_CARRY\_4 | | ||
|------------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------| | ||
| unchecked\_add | 348 ns | 413 ns | 2.95 µs | 12.1 µs | | ||
| add | 7.59 ms | 17.0 ms | 121 ms | 835 ms | | ||
| mul\_lsb | 8.13 ms | 16.8 ms | 121 ms | 827 ms | | ||
| keyswitch\_programmable\_bootstrap | 7.28 ms | 16.6 ms | 121 ms | 811 ms | | ||
|
||
|
||
## Boolean | ||
|
||
All timings are related to parallelized Radix-based integer operations, where each block is encrypted using the default parameters (i.e., PARAM\_MESSAGE\_2\_CARRY\_2, more information about parameters can be found [here](../fine_grained_api/shortint/parameters.md)). | ||
To ensure predictable timings, the operation flavor is the `default` one: the carry is propagated if needed. The operation costs could be reduced by using `unchecked`, `checked`, or `smart`. | ||
This measures the execution time of a single binary Boolean gate. | ||
|
||
### tfhe-rs::boolean. | ||
|
||
## Shortint | ||
This measures the execution time for some operations using various parameter sets of tfhe-rs::shortint. | ||
| Parameter set | Concrete FFT + AVX-512 | | ||
|------------------------------------------------------|------------------------| | ||
| DEFAULT\_PARAMETERS\_KS\_PBS | 9.19 ms | | ||
| PARAMETERS\_ERROR\_PROB\_2\_POW\_MINUS\_165\_KS\_PBS | 14.1 ms | | ||
| TFHE\_LIB\_PARAMETERS | 10.0 ms | | ||
|
||
This uses the Concrete FFT + AVX-512 configuration. | ||
|
||
| Parameter set | unchecked\_add | unchecked\_mul\_lsb | keyswitch\_programmable\_bootstrap | | ||
|-----------------------------|----------------|---------------------|------------------------------------| | ||
| PARAM\_MESSAGE\_1\_CARRY\_1 | 338 ns | 8.3 ms | 8.1 ms | | ||
| PARAM\_MESSAGE\_2\_CARRY\_2 | 406 ns | 18.4 ms | 18.4 ms | | ||
| PARAM\_MESSAGE\_3\_CARRY\_3 | 3.06 µs | 134 ms | 129.4 ms | | ||
| PARAM\_MESSAGE\_4\_CARRY\_4 | 11.7 µs | 854 ms | 828.1 ms | | ||
### tfhe-lib. | ||
|
||
Next, the timings for the operation flavor `default` are given. This flavor ensures predictable timings of an operation along the entire circuit by clearing the carry space after each operation. | ||
Using the same m6i.metal machine as the one for tfhe-rs, the timings are: | ||
|
||
| Parameter set | add | mul\_lsb | keyswitch\_programmable\_bootstrap | | ||
| --------------------------- | -------------- | ------------------- | ---------------------------------- | | ||
| PARAM\_MESSAGE\_1\_CARRY\_1 | 7.90 ms | 8.00 ms | 8.10 ms | | ||
| PARAM\_MESSAGE\_2\_CARRY\_2 | 18.4 ms | 18.1 ms | 18.4 ms | | ||
| PARAM\_MESSAGE\_3\_CARRY\_3 | 131.5 ms | 129.5 ms | 129.4 ms | | ||
| PARAM\_MESSAGE\_4\_CARRY\_4 | 852.5 ms | 839.7 ms | 828.1 ms | | ||
| Parameter set | spqlios-fma | | ||
|--------------------------------------------------|-------------| | ||
| default\_128bit\_gate\_bootstrapping\_parameters | 15.4 ms | | ||
|
||
## How to reproduce benchmarks | ||
### OpenFHE (v1.1.1). | ||
|
||
TFHE-rs benchmarks can easily be reproduced from the [sources](https://github.com/zama-ai/tfhe-rs). | ||
Following the official instructions from OpenFHE, `clang14` and the following command are used to setup the project: | ||
`cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..` | ||
|
||
```shell | ||
#Boolean benchmarks: | ||
make bench_boolean | ||
To use the HEXL library, the configuration used is as follows: | ||
```bash | ||
export CXX=clang++ | ||
export CC=clang | ||
|
||
#Integer benchmarks: | ||
make bench_integer | ||
scripts/configure.sh | ||
Release -> y | ||
hexl -> y | ||
|
||
#Shortint benchmarks: | ||
make bench_shortint | ||
scripts/build-openfhe-development-hexl.sh | ||
``` | ||
|
||
If the host machine supports AVX-512, then the argument `AVX512_SUPPORT=ON' should be added, e.g.: | ||
Using the same m6i.metal machine as the one for tfhe-rs, the timings are: | ||
|
||
| Parameter set | GINX | GINX w/ Intel HEXL | | ||
|----------------------------------|---------|--------------------| | ||
| FHEW\_BINGATE/STD128\_OR | 40.2 ms | 31.0 ms | | ||
| FHEW\_BINGATE/STD128\_LMKCDEY_OR | 38.6 ms | 28.4 ms | | ||
|
||
|
||
## How to reproduce TFHE-rs benchmarks | ||
|
||
TFHE-rs benchmarks can be easily reproduced from [source](https://github.com/zama-ai/tfhe-rs). | ||
|
||
```shell | ||
#Boolean benchmarks: | ||
make AVX512_SUPPORT=ON bench_boolean | ||
|
||
#Integer benchmarks: | ||
make AVX512_SUPPORT=ON bench_integer | ||
|
||
#Shortint benchmarks: | ||
make AVX512_SUPPORT=ON bench_shortint | ||
``` | ||
|
||
If the host machine does not support AVX512, then turning on `AVX512_SUPPORT` will not provide any speed-up. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.