From 7e4fb9230f62eb2e64c53bf2009bc5c4ddc6815d Mon Sep 17 00:00:00 2001 From: Purva Thakre <66048318+purva-thakre@users.noreply.github.com> Date: Thu, 10 Oct 2024 22:35:32 -0500 Subject: [PATCH] Add theory, intro and use case pages of LRE user guide (#2522) * draft 1 * vincent + nate feedback * remove : for warning block * Apply suggestions from code review Co-authored-by: nate stemen * nate's feedback round 2 * Apply suggestions from code review Co-authored-by: nate stemen * clarify theory sections * gen monomial terms * Add intro section to LRE docs (#2535) * add intro and use case pages Co-Authored-By: Purva Thakre * clean up intro/use case * clarify depth comment * wordsmithing --------- Co-authored-by: Purva Thakre * change wording of Bi matrix * cleanup first section * fix l/L typo --------- Co-authored-by: nate stemen Co-authored-by: Purva Thakre --- docs/source/guide/guide.md | 3 +- docs/source/guide/lre-1-intro.md | 152 ++++++++++++++++++++++++++++ docs/source/guide/lre-2-use-case.md | 35 +++++++ docs/source/guide/lre-5-theory.md | 95 +++++++++++++++++ docs/source/guide/lre.md | 28 +++++ 5 files changed, 312 insertions(+), 1 deletion(-) create mode 100644 docs/source/guide/lre-1-intro.md create mode 100644 docs/source/guide/lre-2-use-case.md create mode 100644 docs/source/guide/lre-5-theory.md create mode 100644 docs/source/guide/lre.md diff --git a/docs/source/guide/guide.md b/docs/source/guide/guide.md index 40528efa75..d75a3571c7 100644 --- a/docs/source/guide/guide.md +++ b/docs/source/guide/guide.md @@ -8,11 +8,12 @@ core-concepts.md zne.md pec.md cdr.md -shadows.md ddd.md +lre.md rem.md qse.md pt.md +shadows.md error-mitigation.md glossary.md ``` diff --git a/docs/source/guide/lre-1-intro.md b/docs/source/guide/lre-1-intro.md new file mode 100644 index 0000000000..6604837306 --- /dev/null +++ b/docs/source/guide/lre-1-intro.md @@ -0,0 +1,152 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# How do I use LRE? + +LRE works in two main stages: generate noise-scaled circuits via layerwise scaling, and apply inference to resulting measurements post-execution. + +This workflow can be executed by a single call to {func}`.execute_with_lre`. +If more control is needed over the protocol, Mitiq provides {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` to handle the first and second steps respectively. + +```{danger} +LRE is currently compatible with quantum programs written using `cirq`. +Work on making this technique compatible with other frontends is ongoing. 馃毀 +``` + +## Problem Setup + +To demonstrate the use of LRE, we'll first define a quantum circuit, and a method of executing circuits for demonstration purposes. + +For simplicity, we define a circuit whose unitary compiles to the identity operation. +Here we will use a randomized benchmarking circuit on a single qubit, visualized below. + +```{code-cell} ipython3 +from mitiq import benchmarks + + +circuit = benchmarks.generate_rb_circuits(n_qubits=1, num_cliffords=3)[0] + +print(circuit) +``` + +We define an [executor](executors.md) which simulates the input circuit subjected to depolarizing noise, and returns the probability of measuring the ground state. +By altering the value for `noise_level`, ideal and noisy expectation values can be obtained. + +```{code-cell} ipython3 +from cirq import DensityMatrixSimulator, depolarize + + +def execute(circuit, noise_level=0.025): + noisy_circuit = circuit.with_noise(depolarize(p=noise_level)) + rho = DensityMatrixSimulator().simulate(noisy_circuit).final_density_matrix + return rho[0, 0].real +``` + +Compare the noisy and ideal expectation values: + +```{code-cell} ipython3 +noisy = execute(circuit) +ideal = execute(circuit, noise_level=0.0) +print(f"Error without mitigation: {abs(ideal - noisy) :.5f}") +``` + +## Apply LRE directly + +With the circuit and executor defined, we just need to choose the polynomial extrapolation degree as well as the fold multiplier. + +```{code-cell} ipython3 +from mitiq.lre import execute_with_lre + + +degree = 2 +fold_multiplier = 3 + +mitigated = execute_with_lre( + circuit, + execute, + degree=degree, + fold_multiplier=fold_multiplier, +) + +print(f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}") +``` + +As you can see, the technique is extremely simple to apply, and no knowledge of the hardware/simulator noise is required. + +## Step by step application of LRE + +In this section we demonstrate the use of {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` for those who might want to inspect the intermediary circuits, and have more control over the protocol. + +### Create noise-scaled circuits + +We start by creating a number of noise-scaled circuits which we will pass to the executor. + +```{code-cell} ipython3 +from mitiq.lre import multivariate_layer_scaling + + +noise_scaled_circuits = multivariate_layer_scaling(circuit, degree, fold_multiplier) +num_scaled_circuits = len(noise_scaled_circuits) + +print(f"total number of noise-scaled circuits for LRE = {num_scaled_circuits}") +print( + f"Average circuit depth = {sum(len(circuit) for circuit in noise_scaled_circuits) / num_scaled_circuits}" +) +``` + +As you can see, the noise scaled circuits are on average much longer than the original circuit. +An example noise-scaled circuit is shown below. + +```{code-cell} ipython3 +noise_scaled_circuits[3] +``` + +With the many noise-scaled circuits in hand, we can run them through our executor to obtain the expectation values. + +```{code-cell} ipython3 +noise_scaled_exp_values = [ + execute(circuit) for circuit in noise_scaled_circuits +] +``` + +### Classical inference + +The penultimate step here is to fetch the coefficients we'll use to combine the noisy data we obtained above. +The astute reader will note that we haven't defined or used a `degree` or `fold_multiplier` parameter, and this is where they are both needed. + +```{code-cell} ipython3 +from mitiq.lre import multivariate_richardson_coefficients + + +coefficients = multivariate_richardson_coefficients( + circuit, + fold_multiplier=fold_multiplier, + degree=degree, +) +``` + +Each noise scaled circuit has a coefficient of linear combination and a noisy expectation value associated with it. + +### Combine the results + +```{code-cell} ipython3 +mitigated = sum( + exp_val * coeff + for exp_val, coeff in zip(noise_scaled_exp_values, coefficients) +) +print( + f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}" +) +``` + +As you can see we again see a nice improvement in the accuracy using a two stage application of LRE. diff --git a/docs/source/guide/lre-2-use-case.md b/docs/source/guide/lre-2-use-case.md new file mode 100644 index 0000000000..12b1d13951 --- /dev/null +++ b/docs/source/guide/lre-2-use-case.md @@ -0,0 +1,35 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.10.3 +kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# When should I use LRE? + +## Advantages + +Just as in ZNE, LRE can also be applied without a detailed knowledge of the underlying noise model as the effectiveness of the technique depends on the choice of scale factors. +Thus, LRE is useful in scenarios where tomography is impractical. + +The sampling overhead is flexible wherein the cost can be reduced by using larger values for the fold multiplier (used to +create the noise-scaled circuits) or by chunking a larger circuit to fold groups of layers of circuits instead of each one individually. + +## Disadvantages + +When using a large circuit, the number of noise scaled circuits grows polynomially such that the execution time rises because we require the sample matrix to be a square matrix (more details in the [theory](lre-5-theory.md) section). + +When reducing the sampling cost by using a larger fold multiplier, the bias for polynomial extrapolation increases as one moves farther away from the zero-noise limit. + +Chunking a large circuit with a lower number of chunks to reduce the sampling cost can reduce the performance of LRE. +In ZNE parlance, this is equivalent to local folding faring better than global folding in LRE when we use a higher number of chunks in LRE. + +```{attention} +We are currently investigating the issue related to the performance of chunking large circuits. +``` diff --git a/docs/source/guide/lre-5-theory.md b/docs/source/guide/lre-5-theory.md new file mode 100644 index 0000000000..2fa960bf27 --- /dev/null +++ b/docs/source/guide/lre-5-theory.md @@ -0,0 +1,95 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.4 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# What is the theory behind LRE? + +Similar to [ZNE](zne.md), LRE works in two steps: + +- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding. + +- **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation. + +The noise-scaled circuits in ZNE are scaled by the user choosing which layers of the input circuit to fold whereas in LRE +each noise-scaled circuit scales the layers in the input circuit in a specific pattern. +LRE leverages the flexible configuration space of layerwise unitary folding, allowing for a more nuanced mitigation of errors by treating the noise level of each layer of the quantum circuit as an independent variable. + +## Step 1: Create noise-scaled circuits + +The goal is to create noise-scaled circuits of different depths where the layers in each circuit are scaled in a specific pattern as a result of [unitary folding](zne-5-theory.md). +This pattern is described by the vector of scale factor vectors which are generated after the fold multiplier and degree for multivariate Richardson extrapolation are chosen. + +Suppose we're interested in the value of some observable of a circuit $C$ that has $l$ layers. +For each layer $0 \leq L \leq l$ we can choose a scale factor for how much to scale that particular layer. +Thus a vector $\lambda \in \mathbb{R}^l_+$ corresponds to a folding configuration where $\lambda_0$ corresponds to the scale factor for the first layer, and $\lambda_{l - 1}$ is the scale factor to apply on the circuits final layer. + +Fix the number of noise-scaled circuits we wish to generate at $M\in\mathbb{N}$. +Define $\Lambda = (位_1, 位_2, \ldots, 位_M)^T$ to be the collection of scale factors and let $(C_{位_1}, C_{位_2}, \ldots, C_{位_M})^T$ denote the noise-scaled circuits corresponding to each scale factor. + +After $d$ is fixed as the degree of the multivariate polynomial, we define $M_j(位_i, d)$ to be the terms in the polynomial arranged in increasing order. +In general, the number of monomial terms with $l$ variables up to degree $d$ can be determined +through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29). + +For example, if $C$ has 2 layers, the degree of the extrapolating polynomial is 2, the basis of monomials contains 6 terms: $\{1, 位_1, 位_2, {位_1}^2, 位_1 \cdot 位_2, {位_2}^2 \}$. + +$$ +\text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d} +$$ + +As the choice for the degree of the extrapolating polynomial is 2, we search for the number of terms with total degree 2 using the following formula: + +$$ +\text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d} +$$ + +Terms with total degree 2 are 3 calculated by $\binom{2 + 2 -1}{2} = 3$ and correspond to $\{{位_1}^2, 位_1 \cdot 位_2, {位_2}^2 \}$. + +Similarly, number of terms with total degree 1 and 0 can be calculated as $\binom{1 + 2 -1}{1} = 2:\{位_1, 位_2\}$ and $\binom{0 + 2 -1}{0}= 1: \{1\}$ respectively. + +These terms in the monomial basis define the rows of the square sample matrix as shown below: + +$$ +\mathbf{A}(\Lambda, d) = +\begin{bmatrix} + M_1(位_1, d) & M_2(位_1, d) & \cdots & M_N(位_1, d) \\ + M_1(位_2, d) & M_2(位_2, d) & \cdots & M_N(位_2, d) \\ + \vdots & \vdots & \ddots & \vdots \\ + M_1(位_N, d) & M_2(位_N, d) & \cdots & M_N(位_N, d) +\end{bmatrix} +$$ + +For our example circuit of $l=2$ and $d=2$, each row defined by the generic monomial terms $\{M_1(位_i, d), M_2(位_i, d), \ldots, M_N(位_i, d)\}$ in the sample matrix $\mathbf{A}$ will instead be replaced by $\{1, 位_1, 位_2, {位_1}^2, 位_1 \cdot 位_2, {位_2}^2 \}$. + +Here, each monomial term in the sample matrix $\mathbf{A}$ is then evaluated using the values in the scale factor vectors. In Step 2, this sample matrix will be utilized to obtain our mitigated expectation value. + +## Step 2: Extrapolate to the noiseless limit + +Each noise scaled circuit $C_{位_i}$ has an expectation value $\langle O(位_i) \rangle$ associated with it such that we can define a vector of the noisy expectation values $z = (\langle O(位_1) \rangle, \langle O(位_2) \rangle, \ldots, \langle O(位_M)\rangle)^T$. +These values can then be combined via a linear combination to estimate the ideal value $variable$. + +$$ +O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(位_i) \rangle. +$$ + +Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors. + +The [general multivariate Lagrange interpolation polynomial](https://www.siam.org/media/wkvnvame/a_simple_expression_for_multivariate.pdf) is defined by a new matrix $\mathbf{B}_i$ obtained by replacing the $i$-th row of the sample matrix $\mathbf{A}$ with monomial terms evaluated using the generic variable 位. Thus, matrix $\mathbf{B}_i$ represents an interpolating polynomial in variable 位 of degree $d$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $位 = 0$ i.e. the zero-noise limit. + +To get the matrix $\mathbf{B}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_i=(1, 0, \ldots, 0)$ where except $M_1(0, d) = 1$ all the other monomial terms are zero when $位=0$. + +$$ +O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle \frac{\det \left(\mathbf{B}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)} +$$ + +To summarize, based on a user's choice of degree of extrapolating polynomial for some circuit, expectation values from noise scaled circuits created in a specific pattern along with multivariate Lagrange interpolation of the sample matrix evaluated using the scale factor vectors are used to find error mitigated expectation value. + +Additional details on the LRE functionality are available in the [API-doc](https://mitiq.readthedocs.io/en/stable/apidoc.html#module-mitiq.lre.multivariate_scaling.layerwise_folding). diff --git a/docs/source/guide/lre.md b/docs/source/guide/lre.md new file mode 100644 index 0000000000..b9a54e1218 --- /dev/null +++ b/docs/source/guide/lre.md @@ -0,0 +1,28 @@ +```{warning} +The user guide for LRE in Mitiq is currently under construction. +``` + +# Layerwise Richardson Extrapolation + +Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in +{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input +circuit such that the noiseless expectation value is extrapolated from the execution of each +noisy circuit. + +Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in +{cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input +circuit such that the noiseless expectation value is extrapolated from the execution of each +noisy circuit (see the section [What is the theory behind LRE?](lre-5-theory.md)). Compared to +Zero-Noise Extrapolation, this technique treats the noise in each layer of the circuit +as an independent variable to be scaled and then extrapolated independently. + +You can get started with LRE in Mitiq with the following sections of the user guide: + +```{toctree} +--- +maxdepth: 1 +--- +lre-1-intro.md +lre-2-use-case.md +lre-5-theory.md +```