From 7e4fb9230f62eb2e64c53bf2009bc5c4ddc6815d Mon Sep 17 00:00:00 2001
From: Purva Thakre <66048318+purva-thakre@users.noreply.github.com>
Date: Thu, 10 Oct 2024 22:35:32 -0500
Subject: [PATCH] Add theory, intro and use case pages of LRE user guide
 (#2522)

* draft 1

* vincent + nate feedback

* remove : for warning block

* Apply suggestions from code review

Co-authored-by: nate stemen <nate@unitary.fund>

* nate's feedback round 2

* Apply suggestions from code review

Co-authored-by: nate stemen <nate@unitary.fund>

* clarify theory sections

* gen monomial terms

* Add intro section to LRE docs (#2535)

* add intro and use case pages

Co-Authored-By: Purva Thakre <purva@unitary.fund>

* clean up intro/use case

* clarify depth comment

* wordsmithing

---------

Co-authored-by: Purva Thakre <purva@unitary.fund>

* change wording of Bi matrix

* cleanup first section

* fix l/L typo

---------

Co-authored-by: nate stemen <nate@unitary.fund>
Co-authored-by: Purva Thakre <purva@unitary.fund>
---
 docs/source/guide/guide.md          |   3 +-
 docs/source/guide/lre-1-intro.md    | 152 ++++++++++++++++++++++++++++
 docs/source/guide/lre-2-use-case.md |  35 +++++++
 docs/source/guide/lre-5-theory.md   |  95 +++++++++++++++++
 docs/source/guide/lre.md            |  28 +++++
 5 files changed, 312 insertions(+), 1 deletion(-)
 create mode 100644 docs/source/guide/lre-1-intro.md
 create mode 100644 docs/source/guide/lre-2-use-case.md
 create mode 100644 docs/source/guide/lre-5-theory.md
 create mode 100644 docs/source/guide/lre.md

diff --git a/docs/source/guide/guide.md b/docs/source/guide/guide.md
index 40528efa75..d75a3571c7 100644
--- a/docs/source/guide/guide.md
+++ b/docs/source/guide/guide.md
@@ -8,11 +8,12 @@ core-concepts.md
 zne.md
 pec.md
 cdr.md
-shadows.md
 ddd.md
+lre.md
 rem.md
 qse.md
 pt.md
+shadows.md
 error-mitigation.md
 glossary.md
 ```
diff --git a/docs/source/guide/lre-1-intro.md b/docs/source/guide/lre-1-intro.md
new file mode 100644
index 0000000000..6604837306
--- /dev/null
+++ b/docs/source/guide/lre-1-intro.md
@@ -0,0 +1,152 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+# How do I use LRE?
+
+LRE works in two main stages: generate noise-scaled circuits via layerwise scaling, and apply inference to resulting measurements post-execution.
+
+This workflow can be executed by a single call to {func}`.execute_with_lre`.
+If more control is needed over the protocol, Mitiq provides {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` to handle the first and second steps respectively.
+
+```{danger}
+LRE is currently compatible with quantum programs written using `cirq`.
+Work on making this technique compatible with other frontends is ongoing. 🚧
+```
+
+## Problem Setup
+
+To demonstrate the use of LRE, we'll first define a quantum circuit, and a method of executing circuits for demonstration purposes.
+
+For simplicity, we define a circuit whose unitary compiles to the identity operation.
+Here we will use a randomized benchmarking circuit on a single qubit, visualized below.
+
+```{code-cell} ipython3
+from mitiq import benchmarks
+
+
+circuit = benchmarks.generate_rb_circuits(n_qubits=1, num_cliffords=3)[0]
+
+print(circuit)
+```
+
+We define an [executor](executors.md) which simulates the input circuit subjected to depolarizing noise, and returns the probability of measuring the ground state.
+By altering the value for `noise_level`, ideal and noisy expectation values can be obtained.
+
+```{code-cell} ipython3
+from cirq import DensityMatrixSimulator, depolarize
+
+
+def execute(circuit, noise_level=0.025):
+    noisy_circuit = circuit.with_noise(depolarize(p=noise_level))
+    rho = DensityMatrixSimulator().simulate(noisy_circuit).final_density_matrix
+    return rho[0, 0].real
+```
+
+Compare the noisy and ideal expectation values:
+
+```{code-cell} ipython3
+noisy = execute(circuit)
+ideal = execute(circuit, noise_level=0.0)
+print(f"Error without mitigation: {abs(ideal - noisy) :.5f}")
+```
+
+## Apply LRE directly
+
+With the circuit and executor defined, we just need to choose the polynomial extrapolation degree as well as the fold multiplier.
+
+```{code-cell} ipython3
+from mitiq.lre import execute_with_lre
+
+
+degree = 2
+fold_multiplier = 3
+
+mitigated = execute_with_lre(
+    circuit,
+    execute,
+    degree=degree,
+    fold_multiplier=fold_multiplier,
+)
+
+print(f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}")
+```
+
+As you can see, the technique is extremely simple to apply, and no knowledge of the hardware/simulator noise is required.
+
+## Step by step application of LRE
+
+In this section we demonstrate the use of {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` for those who might want to inspect the intermediary circuits, and have more control over the protocol.
+
+### Create noise-scaled circuits
+
+We start by creating a number of noise-scaled circuits which we will pass to the executor.
+
+```{code-cell} ipython3
+from mitiq.lre import multivariate_layer_scaling
+
+
+noise_scaled_circuits = multivariate_layer_scaling(circuit, degree, fold_multiplier)
+num_scaled_circuits = len(noise_scaled_circuits)
+
+print(f"total number of noise-scaled circuits for LRE = {num_scaled_circuits}")
+print(
+    f"Average circuit depth = {sum(len(circuit) for circuit in noise_scaled_circuits) / num_scaled_circuits}"
+)
+```
+
+As you can see, the noise scaled circuits are on average much longer than the original circuit.
+An example noise-scaled circuit is shown below.
+
+```{code-cell} ipython3
+noise_scaled_circuits[3]
+```
+
+With the many noise-scaled circuits in hand, we can run them through our executor to obtain the expectation values.
+
+```{code-cell} ipython3
+noise_scaled_exp_values = [
+    execute(circuit) for circuit in noise_scaled_circuits
+]
+```
+
+### Classical inference
+
+The penultimate step here is to fetch the coefficients we'll use to combine the noisy data we obtained above.
+The astute reader will note that we haven't defined or used a `degree` or `fold_multiplier` parameter, and this is where they are both needed.
+
+```{code-cell} ipython3
+from mitiq.lre import multivariate_richardson_coefficients
+
+
+coefficients = multivariate_richardson_coefficients(
+    circuit,
+    fold_multiplier=fold_multiplier,
+    degree=degree,
+)
+```
+
+Each noise scaled circuit has a coefficient of linear combination and a noisy expectation value associated with it.
+
+### Combine the results
+
+```{code-cell} ipython3
+mitigated = sum(
+    exp_val * coeff
+    for exp_val, coeff in zip(noise_scaled_exp_values, coefficients)
+)
+print(
+    f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}"
+)
+```
+
+As you can see we again see a nice improvement in the accuracy using a two stage application of LRE.
diff --git a/docs/source/guide/lre-2-use-case.md b/docs/source/guide/lre-2-use-case.md
new file mode 100644
index 0000000000..12b1d13951
--- /dev/null
+++ b/docs/source/guide/lre-2-use-case.md
@@ -0,0 +1,35 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.10.3
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+# When should I use LRE?
+
+## Advantages
+
+Just as in ZNE, LRE can also be applied without a detailed knowledge of the underlying noise model as the effectiveness of the technique depends on the choice of scale factors.
+Thus, LRE is useful in scenarios where tomography is impractical.
+
+The sampling overhead is flexible wherein the cost can be reduced by using larger values for the fold multiplier (used to
+create the noise-scaled circuits) or by chunking a larger circuit to fold groups of layers of circuits instead of each one individually.
+
+## Disadvantages
+
+When using a large circuit, the number of noise scaled circuits grows polynomially such that the execution time rises because we require the sample matrix to be a square matrix (more details in the [theory](lre-5-theory.md) section).
+
+When reducing the sampling cost by using a larger fold multiplier, the bias for polynomial extrapolation increases as one moves farther away from the zero-noise limit.
+
+Chunking a large circuit with a lower number of chunks to reduce the sampling cost can reduce the performance of LRE.
+In ZNE parlance, this is equivalent to local folding faring better than global folding in LRE when we use a higher number of chunks in LRE.
+
+```{attention}
+We are currently investigating the issue related to the performance of chunking large circuits.
+```
diff --git a/docs/source/guide/lre-5-theory.md b/docs/source/guide/lre-5-theory.md
new file mode 100644
index 0000000000..2fa960bf27
--- /dev/null
+++ b/docs/source/guide/lre-5-theory.md
@@ -0,0 +1,95 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.4
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+# What is the theory behind LRE?
+
+Similar to [ZNE](zne.md), LRE works in two steps:
+
+- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding.
+
+- **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation.
+
+The noise-scaled circuits in ZNE are scaled by the user choosing which layers of the input circuit to fold whereas in LRE
+each noise-scaled circuit scales the layers in the input circuit in a specific pattern.
+LRE leverages the flexible configuration space of layerwise unitary folding, allowing for a more nuanced mitigation of errors by treating the noise level of each layer of the quantum circuit as an independent variable.
+
+## Step 1: Create noise-scaled circuits
+
+The goal is to create noise-scaled circuits of different depths where the layers in each circuit are scaled in a specific pattern as a result of [unitary folding](zne-5-theory.md).
+This pattern is described by the vector of scale factor vectors which are generated after the fold multiplier and degree for multivariate Richardson extrapolation are chosen.
+
+Suppose we're interested in the value of some observable of a circuit $C$ that has $l$ layers.
+For each layer $0 \leq L \leq l$ we can choose a scale factor for how much to scale that particular layer.
+Thus a vector $\lambda \in \mathbb{R}^l_+$ corresponds to a folding configuration where $\lambda_0$ corresponds to the scale factor for the first layer, and $\lambda_{l - 1}$ is the scale factor to apply on the circuits final layer.
+
+Fix the number of noise-scaled circuits we wish to generate at $M\in\mathbb{N}$.
+Define $\Lambda = (λ_1, λ_2, \ldots, λ_M)^T$ to be the collection of scale factors and let $(C_{λ_1}, C_{λ_2}, \ldots, C_{λ_M})^T$ denote the noise-scaled circuits corresponding to each scale factor.
+
+After $d$ is fixed as the degree of the multivariate polynomial, we define $M_j(λ_i, d)$ to be the terms in the polynomial arranged in increasing order.
+In general, the number of monomial terms with $l$ variables up to degree $d$ can be determined
+through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
+
+For example, if $C$ has 2 layers, the degree of the extrapolating polynomial is 2, the basis of monomials contains 6 terms: $\{1, λ_1, λ_2, {λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.
+
+$$
+\text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d}
+$$
+
+As the choice for the degree of the extrapolating polynomial is 2, we search for the number of terms with total degree 2 using the following formula:
+
+$$
+\text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d}
+$$
+
+Terms with total degree 2 are 3 calculated by $\binom{2 + 2 -1}{2} = 3$ and correspond to $\{{λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.
+
+Similarly, number of terms with total degree 1 and 0 can be calculated as $\binom{1 + 2 -1}{1} = 2:\{λ_1, λ_2\}$ and $\binom{0 + 2 -1}{0}= 1: \{1\}$ respectively.
+
+These terms in the monomial basis define the rows of the square sample matrix as shown below:
+
+$$
+\mathbf{A}(\Lambda, d) =
+\begin{bmatrix}
+    M_1(λ_1, d) & M_2(λ_1, d) & \cdots & M_N(λ_1, d) \\
+    M_1(λ_2, d) & M_2(λ_2, d) & \cdots & M_N(λ_2, d) \\
+    \vdots & \vdots & \ddots & \vdots \\
+    M_1(λ_N, d) & M_2(λ_N, d) & \cdots & M_N(λ_N, d)
+\end{bmatrix}
+$$
+
+For our example circuit of $l=2$ and $d=2$, each row defined by the generic monomial terms $\{M_1(λ_i, d), M_2(λ_i, d), \ldots, M_N(λ_i, d)\}$ in the sample matrix $\mathbf{A}$ will instead be replaced by $\{1, λ_1, λ_2, {λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.
+
+Here, each monomial term in the sample matrix $\mathbf{A}$ is then evaluated using the values in the scale factor vectors. In Step 2, this sample matrix will be utilized to obtain our mitigated expectation value.
+
+## Step 2: Extrapolate to the noiseless limit
+
+Each noise scaled circuit $C_{λ_i}$ has an expectation value $\langle O(λ_i) \rangle$ associated with it such that we can define a vector of the noisy expectation values $z = (\langle O(λ_1) \rangle, \langle O(λ_2) \rangle, \ldots, \langle O(λ_M)\rangle)^T$.
+These values can then be combined via a linear combination to estimate the ideal value $variable$.
+
+$$
+O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(λ_i) \rangle.
+$$
+
+Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
+
+The [general multivariate Lagrange interpolation polynomial](https://www.siam.org/media/wkvnvame/a_simple_expression_for_multivariate.pdf) is defined by a new matrix $\mathbf{B}_i$ obtained by replacing the $i$-th row of the sample matrix $\mathbf{A}$ with monomial terms evaluated using the generic variable λ. Thus, matrix $\mathbf{B}_i$ represents an interpolating polynomial in variable λ of degree $d$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$ i.e. the zero-noise limit.
+
+To get the matrix $\mathbf{B}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_i=(1, 0, \ldots, 0)$ where except $M_1(0, d) = 1$ all the other monomial terms are zero when $λ=0$.
+
+$$
+O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle  \frac{\det \left(\mathbf{B}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}
+$$
+
+To summarize, based on a user's choice of degree of extrapolating polynomial for some circuit, expectation values from noise scaled circuits created in a specific pattern along with multivariate Lagrange interpolation of the sample matrix evaluated using the scale factor vectors are used to find error mitigated expectation value.
+
+Additional details on the LRE functionality are available in the [API-doc](https://mitiq.readthedocs.io/en/stable/apidoc.html#module-mitiq.lre.multivariate_scaling.layerwise_folding).
diff --git a/docs/source/guide/lre.md b/docs/source/guide/lre.md
new file mode 100644
index 0000000000..b9a54e1218
--- /dev/null
+++ b/docs/source/guide/lre.md
@@ -0,0 +1,28 @@
+```{warning}
+The user guide for LRE in Mitiq is currently under construction.
+```
+
+# Layerwise Richardson Extrapolation
+
+Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
+{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
+circuit such that the noiseless expectation value is extrapolated from the execution of each
+noisy circuit.
+
+Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
+{cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input
+circuit such that the noiseless expectation value is extrapolated from the execution of each
+noisy circuit (see the section [What is the theory behind LRE?](lre-5-theory.md)). Compared to
+Zero-Noise Extrapolation, this technique treats the noise in each layer of the circuit
+as an independent variable to be scaled and then extrapolated independently.
+
+You can get started with LRE in Mitiq with the following sections of the user guide:
+
+```{toctree}
+---
+maxdepth: 1
+---
+lre-1-intro.md
+lre-2-use-case.md
+lre-5-theory.md
+```