apply comments

openvinotoolkit · May 28, 2024 · a93e89c · a93e89c
1 parent 22abac4
commit a93e89c
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb b/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb
@@ -22,7 +22,7 @@
     "- [Prepare Inference Pipeline](#Prepare-Inference-Pipeline)\n",
     "- [Run Video Generation](#Run-Video-Generation)\n",
     "    - [Select Inference Device](#Select-Inference-Device)\n",
-    " - [Quantization](#Quantization)\n",
+    "- [Quantization](#Quantization)\n",
     "    - [Prepare calibration dataset](#Prepare-calibration-dataset)\n",
     "    - [Run Hybrid Model Quantization](#Run-Hybrid-Model-Quantization)\n",
     "    - [Run Weight Compression](#Run-Weight-Compression)\n",
@@ -1094,7 +1094,7 @@
     "\n",
     "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n",
     "\n",
-    "According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use weight compression for the `vae encoder and decoder` to reduce the memory footprint.\n",
+    "According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use only weight compression for the `vae encoder` and `vae decoder` to reduce the memory footprint.\n",
     "\n",
     "For the UNet model we apply quantization in hybrid mode which means that we quantize: (1) weights of MatMul and Embedding layers and (2) activations of other layers. The steps are the following:\n",
     "\n",
@@ -1239,7 +1239,7 @@
    "id": "bfdee6ad",
    "metadata": {
     "test_replace": {
-    "subset_size = 200": "subset_size = 4"
+     "subset_size = 200": "subset_size = 4"
     }
    },
    "outputs": [],
@@ -1355,7 +1355,7 @@
     "### Run Weight Compression\n",
     "[back to top ⬆️](#Table-of-contents:)\n",
     "\n",
-    "Quantizing of the `vae encoder and decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. The weight compression will be applied to footprint reduction."
+    "Quantizing of the `vae encoder` and `vae decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. Only weight compression will be applied for footprint reduction."
    ]
   },
   {