diff --git a/tfhe/docs/SUMMARY.md b/tfhe/docs/SUMMARY.md
index 7e4667ea46..aa9d283757 100644
--- a/tfhe/docs/SUMMARY.md
+++ b/tfhe/docs/SUMMARY.md
@@ -8,7 +8,10 @@
 * [Installation](getting\_started/installation.md)
 * [Quick start](getting\_started/quick\_start.md)
 * [Types & Operations](getting\_started/operations.md)
-* [Benchmarks](getting\_started/benchmarks.md)
+* [Benchmarks](getting\_started/benchmarks/summary.md)
+  * [CPU Benchmarks](getting\_started/benchmarks/cpu\_benchmarks.md)
+  * [GPU Benchmarks](getting\_started/benchmarks/gpu\_benchmarks.md)
+  * [Zero-knowledge proof benchmarks](getting_started/benchmarks/zk_proof_benchmarks.md)
 * [Security and cryptography](getting\_started/security\_and\_cryptography.md)
 
 ## Fundamentals
diff --git a/tfhe/docs/getting_started/benchmarks.md b/tfhe/docs/getting_started/benchmarks.md
deleted file mode 100644
index b48e97a20f..0000000000
--- a/tfhe/docs/getting_started/benchmarks.md
+++ /dev/null
@@ -1,126 +0,0 @@
-# Benchmarks
-
-This document details the performance benchmarks of homomorphic operations using **TFHE-rs**.
-
-By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
-
-{% hint style="info" %}
-All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped with an `AMD EPYC 9R14 CPU @ 2.60GHz` and 740GB of RAM.
-{% endhint %}
-
-## Integer operations
-
-The following tables benchmark the execution time of some operation sets using `FheUint` (unsigned integers). The `FheInt` (signed integers) performs similarly.
-
-The next table shows the operation timings on CPU when all inputs are encrypted:
-
-| Operation \ Size                                       |  `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-| ------------------------------------------------------ |  ---------- | ----------- | ----------- | ----------- | ------------ | ------------ |
-| Negation (`-`)                                         |  65.1 ms    | 97.0 ms     | 116 ms      | 141 ms      | 186 ms       | 227 ms       |
-| Add / Sub (`+`,`-`)                                    |  75.8 ms    | 96.7 ms     | 118 ms      | 150 ms      | 186 ms       | 230 ms       |
-| Mul (`x`)                                              |  96.1 ms    | 180 ms      | 251 ms      | 425 ms      | 1.1 s        | 3.66 s       |
-| Equal / Not Equal (`eq`, `ne`)                         |  32.2 ms    | 35.0 ms     | 55.4 ms     | 56.0 ms     | 59.5 ms      | 60.7 ms      |
-| Comparisons  (`ge`, `gt`, `le`, `lt`)                  |  57.1 ms    | 72.9 ms     | 93.0 ms     | 116 ms      | 138 ms       | 164 ms       |
-| Max / Min   (`max`,`min`)                              |  94.3 ms    | 114 ms      | 138 ms      | 159 ms      | 189 ms       | 233 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    |  19.6 ms    | 20.1 ms     | 20.2 ms     | 21.7 ms     | 23.9 ms      | 25.7 ms      |
-| Div / Rem  (`/`, `%`)                                  |  711 ms     | 1.81 s      | 4.43 s      | 10.5 s      | 25.1 s       | 63.2 s       |
-| Left / Right Shifts (`<<`, `>>`)                       |  99.5 ms    | 125 ms      | 155 ms      | 190 ms      | 234 ms       | 434 ms       |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) |  101 ms     | 125 ms      | 154 ms      | 188 ms      | 234 ms       | 430 ms       |
-| Leading / Trailing zeros/ones                          |  96.7 ms    | 155 ms      | 181 ms      | 241 ms      | 307 ms       | 367 ms       |
-| Log2                                                   |  112 ms     | 176 ms      | 200 ms      | 265 ms      | 320 ms       | 379 ms       |
-
-
-The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
-
-| Operation \ Size                                       | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-|--------------------------------------------------------|------------|-------------|-------------|-------------|--------------|--------------|
-| Add / Sub (`+`,`-`)                                    | 75.9 ms    | 95.3 ms     | 119 ms      | 150 ms      | 182 ms       | 224 ms       |
-| Mul (`x`)                                              | 79.3 ms    | 163 ms      | 211 ms      | 273 ms      | 467 ms       | 1.09 s       |
-| Equal / Not Equal (`eq`, `ne`)                         | 31.2 ms    | 30.9 ms     | 34.4 ms     | 54.5 ms     | 57.0 ms      | 58.0 ms      |
-| Comparisons  (`ge`, `gt`, `le`, `lt`)                  | 38.6 ms    | 56.3 ms     | 76.1 ms     | 99.0 ms     | 124 ms       | 141 ms       |
-| Max / Min   (`max`,`min`)                              | 74.0 ms    | 103 ms      | 122 ms      | 144 ms      | 171 ms       | 214 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    | 19.0 ms    | 19.8 ms     | 20.5 ms     | 21.6 ms     | 23.8 ms      | 25.8 ms      |
-| Div  (`/`)                                             | 192 ms     | 255 ms      | 322 ms      | 459 ms      | 877 ms       | 2.61 s       |
-| Rem  (`%`)                                             | 336 ms     | 482 ms      | 650 ms      | 871 ms      | 1.39 s       | 3.05 s       |
-| Left / Right Shifts (`<<`, `>>`)                       | 19.5 ms    | 20.2 ms     | 20.7 ms     | 22.1 ms     | 23.8 ms      | 25.6 ms      |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) | 19.0 ms    | 20.0 ms     | 20.8 ms     | 21.7 ms     | 23.9 ms      | 25.7 ms      |
-
-All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters `PARAM_MESSAGE_2_CARRY_2_KS_PBS`. To ensure predictable timings, we perform operations in the `default` mode, which propagates the carry bit as needed. You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes, each balancing performance and security differently.
-
-For more details about parameters, see [here](../references/fine-grained-apis/shortint/parameters.md). You can find the benchmark results on GPU for all these operations [here](../guides/run\_on\_gpu.md#benchmarks).
-
-## Shortint operations
-
-The next table shows the execution time of some operations using various parameter sets of tfhe-rs::shortint. Except for `unchecked_add`, we perform all the operations in the `default` mode. This mode ensures predictable timings along the entire circuit by clearing the carry space after each operation. The configuration is Concrete FFT + AVX-512.
-
-| Parameter set                      | PARAM\_MESSAGE\_1\_CARRY\_1 | PARAM\_MESSAGE\_2\_CARRY\_2 | PARAM\_MESSAGE\_3\_CARRY\_3 | PARAM\_MESSAGE\_4\_CARRY\_4 |
-| ---------------------------------- |-----------------------------|-----------------------------|-----------------------------|-----------------------------|
-| unchecked\_add                     | 559 ns                      | 544 ns                      | 2.26 µs                     | 9.53 µs                     |
-| add                                | 9.98 ms                     | 14.1 ms                     | 113 ms                      | 873 ms                      |
-| mul\_lsb                           | 9.79 ms                     | 13.8 ms                     | 113 ms                      | 794 ms                      |
-| keyswitch\_programmable\_bootstrap | 9.85 ms                     | 13.9 ms                     | 114 ms                      | 791 ms                      |
-
-## Boolean operations
-
-The next table shows the execution time of a single binary Boolean gate.
-
-### tfhe-rs::boolean
-
-| Parameter set                                        | Concrete FFT + AVX-512 |
-| ---------------------------------------------------- |------------------------|
-| DEFAULT\_PARAMETERS\_KS\_PBS                         | 9.98 ms                |
-| PARAMETERS\_ERROR\_PROB\_2\_POW\_MINUS\_165\_KS\_PBS | 17.0 ms                |
-| TFHE\_LIB\_PARAMETERS                                | 9.64 ms                |
-
-#### tfhe-lib
-
-Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:
-
-| Parameter set                                    | spqlios-fma |
-| ------------------------------------------------ | ----------- |
-| default\_128bit\_gate\_bootstrapping\_parameters | 13.5 ms     |
-
-### OpenFHE (v1.1.2)
-
-Following the official instructions from OpenFHE, we use `clang14` and the following command to setup the project: `cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..`
-
-The following example shows how to initialize the configuration to use the HEXL library:
-
-```bash
-export CXX=clang++
-export CC=clang
-
-scripts/configure.sh
-Release -> y
-hexl -> y
-
-scripts/build-openfhe-development-hexl.sh
-```
-
-Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:
-
-| Parameter set                     | GINX    | GINX w/ Intel HEXL |
-| --------------------------------- | ------- |--------------------|
-| FHEW\_BINGATE/STD128\_OR          | 25.5 ms | 24,0 ms            |
-| FHEW\_BINGATE/STD128\_LMKCDEY\_OR | 25.4 ms | 23.6 ms            |
-
-## Reproducing TFHE-rs benchmarks
-
-**TFHE-rs** benchmarks can be easily reproduced from the [source](https://github.com/zama-ai/tfhe-rs).
-
-{% hint style="info" %}
-AVX512 is now enabled by default for benchmarks when available
-{% endhint %}
-
-The following example shows how to reproduce **TFHE-rs** benchmarks:
-
-```shell
-#Boolean benchmarks:
-make bench_boolean
-
-#Integer benchmarks:
-make bench_integer
-
-#Shortint benchmarks:
-make bench_shortint
-```
diff --git a/tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md b/tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md
new file mode 100644
index 0000000000..84cf4c70ca
--- /dev/null
+++ b/tfhe/docs/getting_started/benchmarks/cpu_benchmarks.md
@@ -0,0 +1,52 @@
+# CPU Benchmarks
+
+This document details the CPU performance benchmarks of homomorphic operations using **TFHE-rs**.
+
+By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
+
+{% hint style="info" %}
+All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped with an `AMD EPYC 9R14 CPU @ 2.60GHz` and 740GB of RAM.
+{% endhint %}
+
+## Integer operations
+
+The following tables benchmark the execution time of some operation sets using `FheUint` (unsigned integers). The `FheInt` (signed integers) performs similarly.
+
+The next table shows the operation timings on CPU when all inputs are encrypted
+
+{% embed url="https://docs.google.com/spreadsheets/d/1Z2NZvWEkDnbHPYE4Su0Oh2Zz1VBnT9dWbo3E29-LcDg/edit?usp=sharing" %}
+
+The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1NGPnuBhRasES9Ghaij4ixJJTpXVMqDzbqMniX-qIMGc/edit?usp=sharing" %}
+
+All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters `PARAM_MESSAGE_2_CARRY_2_KS_PBS`. To ensure predictable timings, we perform operations in the `default` mode, which ensures that the input and output encoding are similar (i.e., the carries are always emptied).
+
+You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes from [the fine-grained APIs](../../references/fine-grained-apis/quick\_start.md), each balancing performance and correctness differently. For more details about parameters, see [here](../../references/fine-grained-apis/shortint/parameters.md). You can find the benchmark results on GPU for all these operations [here](../../guides/run\_on\_gpu.md#benchmarks).
+
+## Programmable bootstrapping
+
+The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given. The configuration is Concrete FFT + AVX-512.
+
+{% embed url="https://docs.google.com/spreadsheets/d/1OdZrsk0dHTWSLLvstkpiv0u5G5tE0mCqItTb7WixGdg/edit?usp=sharing" %}
+
+## Reproducing TFHE-rs benchmarks
+
+**TFHE-rs** benchmarks can be easily reproduced from the [source](https://github.com/zama-ai/tfhe-rs).
+
+{% hint style="info" %}
+AVX512 is now enabled by default for benchmarks when available
+{% endhint %}
+
+The following example shows how to reproduce **TFHE-rs** benchmarks:
+
+```shell
+#Boolean benchmarks:
+make bench_boolean
+
+#Integer benchmarks:
+make bench_integer
+
+#Shortint benchmarks:
+make bench_shortint
+```
diff --git a/tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md b/tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md
new file mode 100644
index 0000000000..14deca317e
--- /dev/null
+++ b/tfhe/docs/getting_started/benchmarks/gpu_benchmarks.md
@@ -0,0 +1,27 @@
+# GPU Benchmarks
+
+This document details the GPU performance benchmarks of homomorphic operations using **TFHE-rs**.
+
+All GPU benchmarks presented here were obtained on H100 GPUs, and rely on the multithreaded PBS algorithm. The cryptographic parameters `PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS` were used.
+
+## 1xH100
+Below come the results for the execution on a single H100.
+The following table shows the performance when the inputs of the benchmarked operation are encrypted:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1dhNYXm7oY0l2qjX3dNpSZKjIBJElkEZtPDIWHZ4FA_A/edit?usp=sharing" %}
+
+The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1wtnFnOwHrSOvfTWluUEaDoTULyveseVl1ZsYo3AOFKk/edit?usp=sharing" %}
+
+## 2xH100
+
+Below come the results for the execution on two H100's.
+The following table shows the performance when the inputs of the benchmarked operation are encrypted:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1_2AUeu3ua8_PXxMfeJCh-pp6b9e529PGVEYUuZRAThg/edit?usp=sharing" %}
+
+
+The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
+
+{% embed url="https://docs.google.com/spreadsheets/d/1nLPt_m1MbkSdhMop0iKDnSN_c605l_JdMpK5JC90N_Q/edit?usp=sharing" %}
diff --git a/tfhe/docs/getting_started/benchmarks/summary.md b/tfhe/docs/getting_started/benchmarks/summary.md
new file mode 100644
index 0000000000..4734b7166e
--- /dev/null
+++ b/tfhe/docs/getting_started/benchmarks/summary.md
@@ -0,0 +1,7 @@
+# Benchmarks
+
+This document summarizes the timings of some homomorphic operations over 64-bit encrypted integers, depending on the hardware. More details are given for [the CPU](cpu\_benchmarks.md), [the GPU](gpu\_benchmarks.md), or [zeros-knowledge proofs](zk\_proof\_benchmarks.md).
+
+### Operation time (ms) over FheUint 64
+
+{% embed url="https://docs.google.com/spreadsheets/d/1ZbgsKnFH8eKrFjy9khFeaLYnUhbSV8Xu4H6rwulo0o8/edit?usp=sharing" %}
diff --git a/tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md b/tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md
new file mode 100644
index 0000000000..2cb3a298b6
--- /dev/null
+++ b/tfhe/docs/getting_started/benchmarks/zk_proof_benchmarks.md
@@ -0,0 +1,7 @@
+# Zero-knowledge proof benchmarks
+
+This document details the performance benchmarks of [zero-knowledge proofs](../../guides/zk-pok.md) for [compact public key encryption](../../guides/public_key.md) using **TFHE-rs**.
+
+Benchmarks for the zero-knowledge proofs have been run on a `m6i.4xlarge` with 16 cores to simulate an usual client configuration.  The verification are done on a `hpc7a.96xlarge` AWS instances to mimic a powerful server. 
+
+{% embed url="https://docs.google.com/spreadsheets/d/1llCYHCz2CyLdTwXkiqhjVzJLzxW_RqdjHxmk72m1jm4/edit?usp=sharing" %}
diff --git a/tfhe/docs/guides/run_on_gpu.md b/tfhe/docs/guides/run_on_gpu.md
index d9def76818..da28334964 100644
--- a/tfhe/docs/guides/run_on_gpu.md
+++ b/tfhe/docs/guides/run_on_gpu.md
@@ -178,70 +178,5 @@ Depending on the platform, this can restrict the number of GPUs used to perform
 There is **nothing to change in the code to execute on multiple GPUs**, when 
 they are available and have peer access to GPU 0 via NVLink. To keep the API as user-friendly as possible, the configuration is automatically set, i.e., the user has no fine-grained control over the number of GPUs to be used.
 
-## Benchmarks
-
-All GPU benchmarks presented here were obtained on H100 GPUs, and rely on the multithreaded PBS algorithm. The cryptographic parameters `PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS` were used.
-
-### 1xH100
-Below come the results for the execution on a single H100.
-The following table shows the performance when the inputs of the benchmarked operation are encrypted:
-
-| Operation \ Size                                       | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-|--------------------------------------------------------|------------|-------------|-------------|-------------|--------------|--------------|
-| Negation (`-`)                                         | 18.6 ms    | 24.9 ms     | 34.9 ms     | 52.4 ms     | 101 ms       | 197 ms       |
-| Add / Sub (`+`,`-`)                                    | 18.7 ms    | 25.0 ms     | 35.0 ms     | 52.4 ms     | 101 ms       | 197 ms       |
-| Mul (`x`)                                              | 35.0 ms    | 59.7 ms     | 124 ms      | 378 ms      | 1.31 s       | 5.01 s       |
-| Equal / Not Equal (`eq`, `ne`)                         | 10.5 ms    | 11.1 ms     | 17.2 ms     | 19.5 ms     | 27.9 ms      | 45.2 ms      |
-| Comparisons  (`ge`, `gt`, `le`, `lt`)                  | 19.8 ms    | 25.0 ms     | 31.3 ms     | 40.2 ms     | 53.2 ms      | 85.2 ms      |
-| Max / Min   (`max`,`min`)                              | 30.2 ms    | 37.1 ms     | 46.6 ms     | 61.4 ms     | 91.8 ms      | 154 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    | 4.83 ms    | 5.3 ms      | 6.36 ms     | 8.26 ms     | 15.3 ms      | 25.4 ms      |
-| Div / Rem  (`/`, `%`)                                  | 221 ms     | 528 ms      | 1.31 s      | 3.6 s       | 11.0 s       | 40.0 s       |
-| Left / Right Shifts (`<<`, `>>`)                       | 30.4 ms    | 41.4 ms     | 60.0 ms     | 119 ms      | 221 ms       | 435 ms       |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) | 30.4 ms    | 41.4 ms     | 60.1 ms     | 119 ms      | 221 ms       | 435 ms       |
-
-The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
-
-| Operation \ Size                                       | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-|--------------------------------------------------------|------------|-------------|-------------|-------------|--------------|--------------|
-| Add / Sub (`+`,`-`)                                    | 19.0 ms    | 25.0 ms     | 35.0 ms     | 52.4 ms     | 101 ms       | 197 ms       |
-| Mul (`x`)                                              | 28.1 ms    | 43.9 ms     | 75.4 ms     | 177 ms      | 544 ms       | 1.92 s       |
-| Equal / Not Equal (`eq`, `ne`)                         | 11.5 ms    | 11.9 ms     | 12.5 ms     | 18.9 ms     | 21.7 ms      | 30.6 ms      |
-| Comparisons  (`ge`, `gt`, `le`, `lt`)                  | 12.5 ms    | 17.4 ms     | 22.7 ms     | 29.9 ms     | 39.1 ms      | 57.2 ms      |
-| Max / Min   (`max`,`min`)                              | 22.5 ms    | 28.9 ms     | 37.4 ms     | 50.6 ms     | 77.4 ms      | 126 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    | 4.92 ms    | 5.51 ms     | 6.47 ms     | 8.37 ms     | 15.5 ms      | 25.6 ms      |
-| Div (`/`)                                              | 46.8 ms    | 70.0 ms     | 138 ms      | 354 ms      | 1.10 s       | 3.83 s       |
-| Rem (`%`)                                              | 90.0 ms    | 140 ms      | 250 ms      | 592 ms      | 1.75 s       | 6.06 s       |
-| Left / Right Shifts (`<<`, `>>`)                       | 4.82 ms    | 5.36 ms     | 6.38 ms     | 8.26 ms     | 15.3 ms      | 25.4 ms      |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) | 4.81 ms    | 5.36 ms     | 6.30 ms     | 8.19 ms     | 15.3 ms      | 25.3 ms      |
-
-### 2xH100
-
-Below come the results for the execution on two H100's.
-The following table shows the performance when the inputs of the benchmarked operation are encrypted:
-
-| Operation \ Size                                       | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-| ------------------------------------------------------ | ---------- | ----------- | ----------- | ----------- | ------------ | ------------ |
-| Negation (`-`)                                         | 16.1 ms    | 20.3 ms     | 27.7 ms     | 38.2 ms     | 54.7 ms      | 83.0 ms      |
-| Add / Sub (`+`,`-`)                                    | 16.1 ms    | 20.4 ms     | 27.8 ms     | 38.3 ms     | 54.9 ms      | 83.2 ms      |
-| Mul (`x`)                                              | 31.0 ms    | 49.6 ms     | 92.4 ms     | 267 ms      | 892 ms       | 3.45 s       |
-| Equal / Not Equal (`eq`, `ne`)                         | 11.2 ms    | 12.9 ms     | 20.4 ms     | 27.3 ms     | 38.8 ms      | 67.0 ms      |
-| Max / Min   (`max`,`min`)                              | 53.4 ms    | 59.3 ms     | 70.4 ms     | 89.6 ms     | 120 ms       | 177 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    | 4.16 ms    | 4.62 ms     | 5.61 ms     | 7.52 ms     | 10.2 ms      | 15.7 ms      |
-| Div / Rem  (`/`, `%`)                                  | 299 ms     | 595 ms      | 1.36 s      | 3.12 s      | 7.8 s        | 21.1 s       |
-| Left / Right Shifts (`<<`, `>>`)                       | 26.9 ms    | 34.5 ms     | 48.7 ms     | 70.2 ms     | 108 ms       | 220 ms       |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) | 26.8 ms    | 34.5 ms     | 48.7 ms     | 70.1 ms     | 108 ms       | 220 ms       |
-
-
-The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
-
-| Operation \ Size                                       | `FheUint8` | `FheUint16` | `FheUint32` | `FheUint64` | `FheUint128` | `FheUint256` |
-| ------------------------------------------------------ |------------|-------------|-------------|-------------|--------------|--------------|
-| Add / Sub (`+`,`-`)                                    | 16.4 ms    | 20.5 ms     | 28.0 ms     | 38.4 ms     | 54.9 ms      | 83.1 ms      |
-| Mul (`x`)                                              | 25.3 ms    | 36.8 ms     | 62.0 ms     | 130 ms      | 377 ms       | 1.35 s       |
-| Equal / Not Equal (`eq`, `ne`)                         | 36.4 ms    | 36.5 ms     | 39.3 ms     | 47.1 ms     | 58.0 ms      | 78.0 ms      |
-| Max / Min   (`max`,`min`)                              | 53.6 ms    | 60.8 ms     | 71.9 ms     | 89.4 ms     | 119 ms       | 173 ms       |
-| Bitwise operations (`&`, `\|`, `^`)                    | 4.33 ms    | 4.76 ms     | 6.4 ms      | 7.65 ms     | 10.4 ms      | 15.7 ms      |
-| Div (`/`)                                              | 40.9 ms    | 59.7 ms     | 109.0 ms    | 248.5 ms    | 806.1 ms     | 2.9 s        |
-| Rem (`%`)                                              | 80.6 ms    | 116.1 ms    | 199.9 ms    | 412.9 ms    | 1.2 s        | 4.3 s        |
-| Left / Right Shifts (`<<`, `>>`)                       | 4.15 ms    | 4.57 ms     | 6.19 ms     | 7.48 ms     | 10.3 ms      | 15.7 ms      |
-| Left / Right Rotations (`left_rotate`, `right_rotate`) | 4.15 ms    | 4.57 ms     | 6.18 ms     | 7.46 ms     | 10.2 ms      | 15.6 ms      |
+## Benchmark 
+Please refer to the [GPU benchmarks](../getting_started/benchmarks/gpu_benchmarks.md) for detailed performance benchmark results.
diff --git a/tfhe/docs/guides/zk-pok.md b/tfhe/docs/guides/zk-pok.md
index 09d30d6706..790f11f96f 100644
--- a/tfhe/docs/guides/zk-pok.md
+++ b/tfhe/docs/guides/zk-pok.md
@@ -134,20 +134,5 @@ pub fn main() -> Result<(), Box<dyn std::error::Error>> {
 }
 ```
 
-### Benchmarks
-Benchmarks for the proofs have been run on a `m6i.4xlarge` with 16 cores to simulate an usual client configuration.  The verification are done on a `hpc7a.96xlarge` AWS instances to mimic a powerful server. 
-
-Timings in the case where the workload is mainly on the prover, i.e., with the  `ZkComputeLoad::Proof` option.
-
-| Inputs       | Proving | Verifying |
-|--------------|---------|-----------|
-| 1xFheUint64  | 2.79s   | 197ms     |
-| 10xFheUint64 | 3.68s   | 251ms     |
- 
-
-Timings in the case where the workload is mainly on the verifier, i.e., with the  `ZkComputeLoad::Verify` option.
-
-| Inputs       | Proving | Verifying |
-|--------------|---------|-----------|
-| 1xFheUint64  | 730ms   | 522ms     |
-| 10xFheUint64 | 1.08s   | 682ms     |
+## Benchmark 
+Please refer to the [Zero-knowledge proof benchmarks](../getting_started/benchmarks/zk_proof_benchmarks.md) for detailed performance benchmark results.
diff --git a/tfhe/docs/references/fine-grained-apis/shortint/parameters.md b/tfhe/docs/references/fine-grained-apis/shortint/parameters.md
index 70c0881f4a..e2eeaf8482 100644
--- a/tfhe/docs/references/fine-grained-apis/shortint/parameters.md
+++ b/tfhe/docs/references/fine-grained-apis/shortint/parameters.md
@@ -34,7 +34,7 @@ fn main() {
 
 ## Impact of parameters on the operations
 
-As shown [here](../../../getting\_started/benchmarks.md), the choice of the parameter set impacts the operations available and their efficiency.
+As shown [here](../../../getting\_started/benchmarks/cpu\_benchmarks.md), the choice of the parameter set impacts the operations available and their efficiency.
 
 ### Generic bi-variate functions.