diff --git a/samples/azure-quantum/resource-estimation/estimation-profiling.ipynb b/samples/azure-quantum/resource-estimation/estimation-profiling.ipynb
new file mode 100644
index 000000000000..f041f53b978c
--- /dev/null
+++ b/samples/azure-quantum/resource-estimation/estimation-profiling.ipynb
@@ -0,0 +1,839 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Resource estimation profiling in quantum adders\n",
+    "\n",
+    "In this sample, we are implementing two quantum adders and then inspecting them\n",
+    "using the profiling feature in the Azure Quantum Resource Estimator.  The\n",
+    "profiling feature allows you to analyze how sub-operations in the quantum\n",
+    "algorithm impact the overall resources.\n",
+    "\n",
+    "In particular, you will learn how to use the `profiling.call_stack_depth` and\n",
+    "`profiling.inline_functions` job parameters to enable profiles in resource\n",
+    "estimation jobs, and how to use the `call_graph` and `profile` properties on\n",
+    "resource estimation results to show call graphs and resource estimation\n",
+    "profiles, respectively.\n",
+    "\n",
+    "The notebook is structured as follows.  First, we connect to an Azure Quantum\n",
+    "workspace and import the necessary functions, then we describe the two quantum\n",
+    "adder implementations.  It's okay to skip the implementation part and jump right\n",
+    "into the next session, which shows how to use the profiling feature to construct\n",
+    "call graphs and resource estimation profiles for time and space.  Finally, we\n",
+    "show how the detailed profiling information can be used for advanced analyses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.quantum import Workspace\n",
+    "from azure.quantum.target.microsoft import MicrosoftEstimator\n",
+    "import pandas\n",
+    "import qsharp\n",
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace = Workspace (\n",
+    "    resource_id = \"\",\n",
+    "    location = \"\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Quantum adders\n",
+    "\n",
+    "In this section, we implement two Q# operations `RippleCarryAdd` and\n",
+    "`LookAheadAdd` together with estimation entry points `EstimateRippleCarryAdd`\n",
+    "and `EstimateLookAheadAdd`, respectively, which each can be provided a\n",
+    "bit-width.  Feel free to skip over the implementation details of these adders,\n",
+    "right to the next section in which we are using the profiling feature to analyze\n",
+    "them.\n",
+    "\n",
+    "In the realm of quantum computing, the ability to perform addition is crucial\n",
+    "for a wide range of applications. In this section, we will introduce two\n",
+    "different quantum adders that can be used to perform addition: the ripple-carry\n",
+    "adder and the carry-lookahead adder. Given two qubit $n$-bit registers\n",
+    "$|x\\rangle = |(x_{n-1}\\dots x_1x_0)_2\\rangle$ and $|y\\rangle = |(y_{n-1}\\dots\n",
+    "y_1y_0)_2\\rangle$, the goal is compute an $n+1$ bit register $|z\\rangle = |x +\n",
+    "y\\rangle$.\n",
+    "\n",
+    "Both implementations will also support the overflowing variant, in which the\n",
+    "output register has $n$ bits, and we compute $|z\\rangle = |(x + y) \\bmod\n",
+    "2^n\\rangle$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Ripple-carry adder\n",
+    "\n",
+    "We review the ripple-carry adder described in [Halving the cost of quantum\n",
+    "addition](https://arxiv.org/abs/1709.06648).  We focus on the out-of-place\n",
+    "variant, in which the sum of input registers $|x\\rangle$ and $|y\\rangle$ is\n",
+    "computed into a $|0\\rangle$-initialized output register $|z\\rangle$.  A\n",
+    "ripple-carry adder adds the bits of $|x\\rangle$ and $|y\\rangle$ one by one,\n",
+    "starting from the rightmost bit, and propagates the carry generated by the\n",
+    "addition to the next bit to the left. This process is repeated until all the\n",
+    "bits have been added, resulting in a final sum and carry-out.  The core\n",
+    "component of the ripple-carry adder is the full adder, which is a one-bit adder\n",
+    "that takes a carry input bit, and returns a carry output bit, in addition to the\n",
+    "sum bit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "internal operation FullAdder(carryIn : Qubit, x : Qubit, y : Qubit, carryOut : Qubit) : Unit is Adj {\n",
+    "    CNOT(x, y);\n",
+    "    CNOT(x, carryIn);\n",
+    "    ApplyAnd(y, carryIn, carryOut);\n",
+    "    CNOT(x, y);\n",
+    "    CNOT(x, carryOut);\n",
+    "    CNOT(y, carryIn);\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With this operation at hand, implementing the ripple carry adder is as simple as\n",
+    "carrying the carry bit through successive full adder invocations.  The two CNOT\n",
+    "operations at the end handle the case in which $|z\\rangle$ has $n$ bits, in\n",
+    "which the sum for the most significant bit is not computed by a full adder\n",
+    "(since the corresponding carry out bit does not exist)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "open Microsoft.Quantum.Arrays;\n",
+    "open Microsoft.Quantum.Diagnostics;\n",
+    "\n",
+    "operation RippleCarryAdd(xs : Qubit[], ys : Qubit[], zs : Qubit[]) : Unit is Adj {\n",
+    "    let n = Length(xs);\n",
+    "    let m = Length(zs);\n",
+    "\n",
+    "    Fact(Length(ys) == n, \"Registers xs and ys must be of same length\");\n",
+    "    Fact(m == n or m == n + 1, \"Register zs must be same length as xs or one bit larger\");\n",
+    "\n",
+    "    for k in 0..m - 2 {\n",
+    "        FullAdder(zs[k], xs[k], ys[k], zs[k + 1]);\n",
+    "    }\n",
+    "\n",
+    "    if n > 0 and n == Length(zs) {\n",
+    "        CNOT(Tail(xs), Tail(zs));\n",
+    "        CNOT(Tail(ys), Tail(zs));\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we are creating an entry point for resource estimation, in which we are\n",
+    "calling the ripple-carry adder for a given bitwidth, which can be provided as\n",
+    "an input argument. (We are considering the $n+1$ output register in this\n",
+    "notebook.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "operation EstimateRippleCarryAdd(bitwidth : Int) : Unit {\n",
+    "    use xs = Qubit[bitwidth];\n",
+    "    use ys = Qubit[bitwidth];\n",
+    "    use zs = Qubit[bitwidth + 1];\n",
+    "\n",
+    "    RippleCarryAdd(xs, ys, zs);\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Carry-lookahead adder\n",
+    "\n",
+    "In [_A logarithmic-depth quantum carry-lookahead adder_](https://arxiv.org/abs/quant-ph/0406142), the authors present an out-of-place quantum adder implementation based on the [classical carry-lookahead adder method](https://en.wikipedia.org/wiki/Carry-lookahead_adder).  To briefly summarize, the idea of carry-lookahead adder is to compute all carry bits $c_i$ based on propagate bits $p_i = x_i \\oplus y_i$ and generate bits $g_i = x_i \\land y_i$, without requiring other carry bits except for the carry-in $c_0$.\n",
+    "\n",
+    "For example, the first carry bit can be computed as $c_1 = g_0 \\oplus (p_0 \\land c_0)$, since either it is generated from bits $x_0$ and $y_0$ (when both are 1, and therefore $g_0 = 1$) or the carry bit $c_0$ is propagated (if either $x_0$ or $y_0$ is 1, and therefore $p_0 = 1$).  More significant carry bits are computed in a similar way, for example $c_3 = g_2 \\oplus (g_1 \\land p_2) \\oplus (g_0 \\land p_1 \\land p_2) \\oplus (c_0 \\land p_0 \\land p_1 \\land p_2)$.  That is, $c_3$ is either generated from bits at index 2, or generated from bits at index 1 _and_ propagated from bits at index 2, and so on.\n",
+    "\n",
+    "In order to minimize AND gates, these intermediate products can be computed in a clever way, as well as in logarithmic depth.  We are now looking at an implementation of the carry-lookahead adder in Q#, and start by implementing a helper function to compute the number of 1-bits in an integer, also called Hamming weight, using a compact implementation based on a sequence of bitwise manipulations (you can learn more about these constants and why it works [on this article in Wikipedia](https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "function HammingWeight(n : Int) : Int {\n",
+    "    mutable i = n - ((n >>> 1) &&& 0x5555555555555555);\n",
+    "    set i = (i &&& 0x3333333333333333) + ((i >>> 2) &&& 0x3333333333333333);\n",
+    "    return (((i + (i >>> 4)) &&& 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >>> 56;\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we are going to implement internal routines that compute the generalized propagate bits, generalized generate, as well as the carry bits from them.  These are called `PRounds`, `GRounds`, and `CRounds`, and descriptions of their implementations can be found in the paper above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "open Microsoft.Quantum.Arrays;\n",
+    "open Microsoft.Quantum.Convert;\n",
+    "open Microsoft.Quantum.Math;\n",
+    "\n",
+    "internal operation PRounds(pWorkspace : Qubit[][]) : Unit is Adj {\n",
+    "    for ws in Windows(2, pWorkspace) {\n",
+    "        let (current, next) = (Rest(ws[0]), ws[1]);\n",
+    "\n",
+    "        for (m, target) in Enumerated(next) {\n",
+    "            ApplyAnd(current[2 * m], current[2 * m + 1], target);\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "internal operation GRounds(pWorkspace : Qubit[][], gs : Qubit[]) : Unit is Adj {\n",
+    "    let numRounds = Length(pWorkspace);\n",
+    "    let n = Length(gs);\n",
+    "\n",
+    "    for t in 1..numRounds {\n",
+    "        let length = Floor(IntAsDouble(n) / IntAsDouble(2^t)) - 1;\n",
+    "        let ps = pWorkspace[t - 1][0..2...];\n",
+    "\n",
+    "        for m in 0..length {\n",
+    "            CCNOT(gs[2^t * m + 2^(t - 1) - 1], ps[m], gs[2^t * m + 2^t - 1]);\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "internal operation CRounds(pWorkspace : Qubit[][], gs : Qubit[]) : Unit is Adj {\n",
+    "    let n = Length(gs);\n",
+    "\n",
+    "    let start = Floor(Lg(IntAsDouble(2 * n) / 3.0));\n",
+    "    for t in start..-1..1 {\n",
+    "        let length = Floor(IntAsDouble(n - 2^(t - 1)) / IntAsDouble(2^t));\n",
+    "        let ps = pWorkspace[t - 1][1..2...];\n",
+    "\n",
+    "        for m in 1..length {\n",
+    "            CCNOT(gs[2^t * m - 1], ps[m - 1], gs[2^t * m + 2^(t - 1) - 1]);\n",
+    "        }\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With these operations we can build an operation that computes the carry bits\n",
+    "from the initial propagate and generate bits.  Note that the generalized\n",
+    "propagate bits are computed out-of-place into some helper qubits `qs`, whereas\n",
+    "the generalized generate and carry bits are computed in-place into the register\n",
+    "`gs`, which contains the initial generate bits."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "internal operation ComputeCarries(ps : Qubit[], gs : Qubit[]) : Unit is Adj {\n",
+    "    let n = Length(gs);\n",
+    "\n",
+    "    let numRounds = Floor(Lg(IntAsDouble(n)));\n",
+    "    use qs = Qubit[n - HammingWeight(n) - numRounds];\n",
+    "\n",
+    "    let registerPartition = MappedOverRange(t -> Floor(IntAsDouble(n) / IntAsDouble(2^t)) - 1, 1..numRounds - 1);\n",
+    "    let pWorkspace = [ps] + Partitioned(registerPartition, qs);\n",
+    "\n",
+    "    within {\n",
+    "        PRounds(pWorkspace);\n",
+    "    } apply {\n",
+    "        GRounds(pWorkspace, gs);\n",
+    "        CRounds(pWorkspace, gs);\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we are all set up to implement the carry-lookahead adder, by first\n",
+    "computing the initial propagate and generate bits, then computing the carry\n",
+    "bits, and finally computing the output bits using the sums (initial propagate\n",
+    "bits) together with the carry bits.  Note that this implementation supports both\n",
+    "an variant where the output register is 1 bit larger and does not overflow, as\n",
+    "well as a variant in which the sum is computed modulo $2^n$.  The latter uses\n",
+    "the former by using a special treatment of the most-significant bits of the\n",
+    "input registers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "open Microsoft.Quantum.Arrays;\n",
+    "open Microsoft.Quantum.Canon;\n",
+    "open Microsoft.Quantum.Diagnostics;\n",
+    "\n",
+    "operation LookAheadAdd(xs : Qubit[], ys : Qubit[], zs : Qubit[]) : Unit is Adj {\n",
+    "    let n = Length(xs);\n",
+    "    let m = Length(zs);\n",
+    "\n",
+    "    Fact(Length(ys) == n, \"Registers xs and ys must be of same length\");\n",
+    "    Fact(m == n or m == n + 1, \"Register zs must be same length as xs or one bit larger\");\n",
+    "\n",
+    "    if m == n + 1 { // with carry-out\n",
+    "        // compute initial generate values\n",
+    "        for k in 0..n - 1 {\n",
+    "            ApplyAnd(xs[k], ys[k], zs[k + 1]);\n",
+    "        }\n",
+    "\n",
+    "        within {\n",
+    "            // compute initial propagate values\n",
+    "            ApplyToEachA(CNOT, Zipped(xs, ys));\n",
+    "        } apply {\n",
+    "            if n > 1 {\n",
+    "                ComputeCarries(Rest(ys), Rest(zs));\n",
+    "            }\n",
+    "\n",
+    "            // compute sum into carries\n",
+    "            for k in 0..n - 1 {\n",
+    "                CNOT(ys[k], zs[k]);\n",
+    "            }\n",
+    "        }\n",
+    "    } else { // without carry-out\n",
+    "        LookAheadAdd(Most(xs), Most(ys), zs);\n",
+    "        CNOT(Tail(xs), Tail(zs));\n",
+    "        CNOT(Tail(ys), Tail(zs));\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we are creating an entry point for resource estimation, in which we are\n",
+    "calling the carry-lookahead adder for a given bitwidth, which can be provided as\n",
+    "an input argument."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%qsharp\n",
+    "operation EstimateLookAheadAdd(bitwidth : Int) : Unit {\n",
+    "    use xs = Qubit[bitwidth];\n",
+    "    use ys = Qubit[bitwidth];\n",
+    "    use zs = Qubit[bitwidth + 1];\n",
+    "\n",
+    "    LookAheadAdd(xs, ys, zs);\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Profiling\n",
+    "\n",
+    "With these two adder implementations, we are now ready for some profiling.  The\n",
+    "profiling feature in Azure Quantum Resource Estimator will create a resource\n",
+    "estimation profile that will show how sub operations in the program (e.g.,\n",
+    "`FullAdder` inside `RippleCarryAdd`) contribute to the overall costs.\n",
+    "\n",
+    "There are two important concepts to review to best understand the outputs of the\n",
+    "profiling feature, a *call graph* and a *call tree*.  The *call graph* is static\n",
+    "representation of the quantum program which informs which operations call which\n",
+    "other operations.  For example, `RippleCarryAdder` calls `FullAdder`, but both\n",
+    "of these operations call `CNOT`.  The call graph contains a node for each\n",
+    "operation and a directed edge for each calling relation.  The call graph may\n",
+    "contain cycles, e.g., in the case of recursive operations.\n",
+    "\n",
+    "In contrast, a call tree is a dynamic representation of the program execution in\n",
+    "which there are no cycles and for each node there is a clear path from the root\n",
+    "node.  For example, distinguishes the calls to `CCNOT` from `GRounds` and\n",
+    "`CRounds` within the `ComputeCarries` operation in the carry-lookahead adder.\n",
+    "\n",
+    "But let's start looking at concrete examples, and create a resource estimator\n",
+    "instance together with a parameter object to configure the bitwidth argument and\n",
+    "enable the profiling feature."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator = MicrosoftEstimator(workspace)\n",
+    "\n",
+    "params = estimator.make_params()\n",
+    "params.arguments[\"bitwidth\"] = 32\n",
+    "params.profiling.call_stack_depth = 10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We enable the profiling feature by setting the `call_stack_depth` variable in\n",
+    "the `profiling` group to a number that indicates the maximum level up to which\n",
+    "sub operations are tracked.  More precisely, the entry point operation is at\n",
+    "level 0.  Any operation called from the entry point is at level 1, any operation\n",
+    "therein at 2, and so on.  The call stack depth is setting a maximum value to an\n",
+    "operation's level in the call stack for which we track resources in the profile.\n",
+    "\n",
+    "Next, we submit a resource estimation job for the ripple carry adder by\n",
+    "providing the Q# operation `EstimateRippleCarryAdd` and the job parameter\n",
+    "object.  We store there the results of the job in the variable `result_rca`,\n",
+    "where RCA is an abbreviation for ripple-carry adder."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "job = estimator.submit(EstimateRippleCarryAdd, input_params=params)\n",
+    "result_rca = job.get_results()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can inspect the call graph by calling the `call_graph` property on the result\n",
+    "object.  It displays the call graph with the node corresponding to the entry\n",
+    "point operation at the top and aligns other operations top-down according to\n",
+    "their level."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result_rca.call_graph"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that the operation names are mangled, i.e., they contain additional\n",
+    "information.  In this case, this can be a prefix `SNIPPET` or `ENTRYPOINT`,\n",
+    "which is generated by `qsharp` Python library from the Q# that is integrated\n",
+    "into the notebook.  Some operations are prefixed by their namespace, and\n",
+    "operations have a suffix to indicate their variant (e.g., `body` for their\n",
+    "`body` implementation, and `adj` for their `adjoint` implementation).\n",
+    "\n",
+    "As described above, we can see that `RippleCarryAdd` calls both the `FullAdder`\n",
+    "and `CNOT`, and that the `FullAdder` also calls `CNOT` and `ApplyAnd`.  In fact,\n",
+    "the `CNOT` operation is called from 3 other operations.\n",
+    "\n",
+    "Next, let's also generate resource estimates for the carry-lookahead\n",
+    "implementation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
+    "result_cla = job.get_results()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Again, we can look at its call graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result_cla.call_graph"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This call graph is much larger since more operations are involved.  We can see\n",
+    "the `PRounds`, `GRounds`, and `CRounds` operations, and see that `GRounds` and\n",
+    "`CRounds` call `CCNOT`, but `PRounds` calls `ApplyAnd`.  We also see that the\n",
+    "`adjoint` variant of `PRounds` calls the `adjoint` variant of `ApplyAnd`.\n",
+    "\n",
+    "It's possible to obtain a more compact call graph (and resource estimation\n",
+    "profile) by inlining some functions, e.g., those that just call a different\n",
+    "function and have no other logic inside.  We can use the `inline_functions`\n",
+    "parameter for that:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "params.profiling.inline_functions = True\n",
+    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
+    "result_cla_inline = job.get_results()\n",
+    "result_cla_inline.call_graph"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Although the call graph is now smaller, certain details—such as the rounds in\n",
+    "`ComputeCarries`—have been inlined and are no longer available for analysis.\n",
+    "Thus, the value for the `inline` parameter is typically chosen based on the\n",
+    "desired type of analysis. For the purposes of our remaining analysis, we will be\n",
+    "considering profiles for the adder operations without inlining."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we are inspecting the resource estimation profiles of both results.  These\n",
+    "will provide the sub operations' contributions to runtime, logical depth,\n",
+    "physical qubits for the algorithm, and logical qubits for the algorithm.  Let's\n",
+    "first have an overview of the total counts for each result:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "breakdown_rca = result_rca['physicalCounts']['breakdown']\n",
+    "breakdown_cla = result_cla['physicalCounts']['breakdown']\n",
+    "\n",
+    "pandas.DataFrame(data = {\n",
+    "    \"Runtime\": [f\"{result_rca['physicalCounts']['runtime']} ns\", f\"{result_cla['physicalCounts']['runtime']} ns\"],\n",
+    "    \"Logical depth\": [breakdown_rca['logicalDepth'], breakdown_cla['logicalDepth']],\n",
+    "    \"Physical qubits\": [breakdown_rca['physicalQubitsForAlgorithm'], breakdown_cla['physicalQubitsForAlgorithm']],\n",
+    "    \"Logical qubits\": [breakdown_rca['algorithmicLogicalQubits'], breakdown_cla['algorithmicLogicalQubits']],\n",
+    "}, index = [\"Ripple-carry adder\", \"Carry-lookahead adder\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The resource estimation profile is generated as a JSON file that can be read\n",
+    "using the speedscope interactive online profile viewer.  In order to generate\n",
+    "and download the file call the `profile` property on a result object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result_rca.profile"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After downloading and opening the profile, we first see the runtime profile. For\n",
+    "this ripple-carry adder, we can readily see that all the runtime cost is caused\n",
+    "by the And operation inside the full adder.  No other operation contributes cost\n",
+    "the overall runtime.  The top most box is the entry point operation and the root\n",
+    "note of the call tree.  In the next layer below are all operations that are\n",
+    "called from the root node and that contribute to the overall runtime.\n",
+    "\n",
+    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_rca_runtime.png\" style=\"width: 910px\" /></center>\n",
+    "\n",
+    "On the top center is a button that displays _Runtime (1/4)_, which we can press\n",
+    "to look at profiles for the other metrics.  If we click on physical algorithmic\n",
+    "qubits, we get this profile:\n",
+    "\n",
+    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_rca_qubits.png\" style=\"width: 910px\" /></center>\n",
+    "\n",
+    "For the number of qubits we account how many new qubits were allocated by some\n",
+    "operation and track the maximum number of allocated qubits.  In this sample,\n",
+    "`EstimateRippleCarryAdd` allocated all qubits for the adder and there are no\n",
+    "additional helper qubits, e.g., in the implementation of `FullAdder`.  The entry\n",
+    "point operation accounts for the additional qubits that are required for the\n",
+    "padding in the 2D layout on the surface code.\n",
+    "\n",
+    "Let's also generate the profile for the carry-lookahead adder."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result_cla.profile"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The runtime profile for the carry-lookahead adder contains more operations:\n",
+    "\n",
+    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_cla_runtime.png\" style=\"width: 910px\" /></center>\n",
+    "\n",
+    "We can see that the AND gates in the `LookAheadAdd` operation and the\n",
+    "`ComputeCarries` operation contribute to the overall runtime.  Inside the\n",
+    "`ComputeCarries` operation, we can analyze the contribution of the sub\n",
+    "operations for the different rounds, and, e.g., note that the `adjoint` variant\n",
+    "for the `PRounds` takes the fewest time, whereas all other 3 rounds are of\n",
+    "similar complexity.\n",
+    "\n",
+    "In the qubit profile, we see that some additional helper qubits are allocated in\n",
+    "the `ComputeCarries` operation to hold the auxiliary $p$ bits, that are computed\n",
+    "out-of-place and required in the computation of the `GRounds` and `CRounds`.\n",
+    "\n",
+    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_cla_qubits.png\" style=\"width: 910px\" /></center>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced analysis\n",
+    "\n",
+    "In this notebook's final section, we will be conducting advanced analysis on the\n",
+    "profiling data. Specifically, we will programmatically access the profiling data\n",
+    "and examine the impact of `PRounds`, `GRounds`, and `CRounds` on the\n",
+    "carry-lookahead adder's runtime for increasing bitwidths.\n",
+    "\n",
+    "To begin, we will establish parameters for a batching job with bitwidths ranging\n",
+    "from $2^1 = 2$ to $2^{10} = 1024$, with power-of-2 steps. We will also set the\n",
+    "call stack depth to 10, as demonstrated in the examples above. It's worth noting\n",
+    "that while the call stack depth parameter applies to all items in the batching\n",
+    "parameters, the bitwidth must be specified for each item individually."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bitwidths = [2**k for k in range(1, 11)]\n",
+    "params = estimator.make_params(num_items=len(bitwidths))\n",
+    "\n",
+    "params.profiling.call_stack_depth = 10\n",
+    "for i, bitwidth in enumerate(bitwidths):\n",
+    "    params.items[i].arguments['bitwidth'] = bitwidth"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We are submitting the job and store its results in `results_all`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
+    "results_all = job.get_results()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The raw profiling data can be accessed in JSON format via the `'profile'` key in\n",
+    "a resource estimation result. For example, to access the profiling data for the\n",
+    "first item in `results_all`, we would use `results_all[0]['profile']`. The\n",
+    "format follows a custom scheme for speedscope and is documented in detail\n",
+    "[here](https://github.com/jlfwong/speedscope/blob/master/src/lib/file-format-spec.ts),\n",
+    "with the corresponding JSON schema available\n",
+    "[here](https://www.speedscope.app/file-format-schema.json).\n",
+    "   \n",
+    "Each node in the tree is assigned a frame with an index, and the profile\n",
+    "contains samples organized by calling order, with each sample assigned a weight\n",
+    "(e.g. runtime). To locate the frame associated with a given round name (e.g.\n",
+    "`PRounds`), the following Python function can be used: it finds the frame, then\n",
+    "identifies all samples that contain it, and sums up the corresponding weights."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def rounds_runtime(profile, round_name):\n",
+    "    # Find the frame index which name contains `round_name` and ends in \"body\".\n",
+    "    frame_index = [i for (i, frame) in enumerate(profile['shared']['frames']) if re.match(f'.*{round_name}.+body', frame['name'])][0]\n",
+    "\n",
+    "    # The runtime profile is the first profile\n",
+    "    runtime_profile = profile['profiles'][0]\n",
+    "\n",
+    "    # Get variables to the samples and weights field of the runtime profile\n",
+    "    samples = runtime_profile['samples']\n",
+    "    weights = runtime_profile['weights']\n",
+    "\n",
+    "    # Sum up all the weights that correspond to samples that contain the operation,\n",
+    "    # i.e., that contain the frame_index\n",
+    "    return sum(weight for (sample, weight) in zip(samples, weights) if frame_index in sample)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We now extract the profile for each job result item and use the `rounds_runtime`\n",
+    "function to obtain the runtime for each round, add it to a data frame together\n",
+    "with the total runtime and return a plot."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "entries = []\n",
+    "for idx in range(len(results_all)):\n",
+    "    bitwidth = bitwidths[idx]\n",
+    "    result = results_all[idx]\n",
+    "    profile = result['profile']\n",
+    "\n",
+    "    total_runtime = result['physicalCounts']['runtime']\n",
+    "    entries.append([\n",
+    "        rounds_runtime(profile, \"PRounds\"),\n",
+    "        rounds_runtime(profile, \"GRounds\"),\n",
+    "        rounds_runtime(profile, \"CRounds\"),\n",
+    "        total_runtime\n",
+    "    ])\n",
+    "\n",
+    "df = pandas.DataFrame(data=entries, index=bitwidths, columns=[\"PRounds\", \"GRounds\", \"CRounds\", \"Total\"])\n",
+    "ax = df.plot(logx=True, xticks=bitwidths, xlabel=\"Bitwidth\", ylabel=\"Runtime (ns)\")\n",
+    "# show all xticks\n",
+    "from matplotlib.text import Text\n",
+    "ax.set_xticklabels([Text(b, 0, b) for b in bitwidths]);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note how the total runtime grows much faster compared to the runtime of the\n",
+    "rounds.  The reason is that we need $O(n)$ AND gates in the preparation part of\n",
+    "`LookAheadAdd` but only $O(\\log(n))$ AND and CCNOT gates in the `ComputeCarries`\n",
+    "operation.\n",
+    "\n",
+    "Further note that logical depth of a the carry-lookahead adder is also\n",
+    "logarithmic in $n$, since on the logical level, all AND and CCNOT gates, in both\n",
+    "the preparation parts and in the rounds can be applied in parallel.  However,\n",
+    "when mapping to surface code operations using Parallel Synthesis Sequential\n",
+    "Pauli Computation (PSSPC), these operations are sequentialized (see Appendix D\n",
+    "in [Assessing requirements to scale to practical quantum\n",
+    "advantage](https://arxiv.org/pdf/2211.07629.pdf))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next steps"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Great job! You now know how to use the profiling feature of the Azure Quantum\n",
+    "Resource Estimator to analyze how different parts of your program contribute to\n",
+    "overall resource estimates.\n",
+    "\n",
+    "To summarize, you can use the use the `profiling.call_stack_depth` and\n",
+    "`profiling.inline_functions` job parameters to enable profiles in resource\n",
+    "estimation jobs.  In the resource estimation results that you receive after\n",
+    "successful job submission, the `call_graph` and `profile` properties show call\n",
+    "graphs and resource estimation profiles, respectively.\n",
+    "\n",
+    "We encourage you to further explore this feature and try out the following\n",
+    "ideas:\n",
+    "   \n",
+    "* Experiment with changing the `call_stack_depth` parameter\n",
+    "* Investigate call graphs and profiles of recursive programs\n",
+    "* Generate profiles from programs in other notebooks\n",
+    "* Perform an advanced profile analysis to compare the number of helper qubits to\n",
+    "  the number of input and output qubits in the carry-lookahead implementation"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/samples/azure-quantum/resource-estimation/profile_cla_qubits.png b/samples/azure-quantum/resource-estimation/profile_cla_qubits.png
new file mode 100644
index 000000000000..1efb277058bb
Binary files /dev/null and b/samples/azure-quantum/resource-estimation/profile_cla_qubits.png differ
diff --git a/samples/azure-quantum/resource-estimation/profile_cla_runtime.png b/samples/azure-quantum/resource-estimation/profile_cla_runtime.png
new file mode 100644
index 000000000000..2b9d89052670
Binary files /dev/null and b/samples/azure-quantum/resource-estimation/profile_cla_runtime.png differ
diff --git a/samples/azure-quantum/resource-estimation/profile_rca_qubits.png b/samples/azure-quantum/resource-estimation/profile_rca_qubits.png
new file mode 100644
index 000000000000..ff7f7c078882
Binary files /dev/null and b/samples/azure-quantum/resource-estimation/profile_rca_qubits.png differ
diff --git a/samples/azure-quantum/resource-estimation/profile_rca_runtime.png b/samples/azure-quantum/resource-estimation/profile_rca_runtime.png
new file mode 100644
index 000000000000..e5ac29a59100
Binary files /dev/null and b/samples/azure-quantum/resource-estimation/profile_rca_runtime.png differ