From a93e89c92f710ff0f9644ebcea8a4d2aa58b178f Mon Sep 17 00:00:00 2001 From: Liubov Talamanova Date: Tue, 28 May 2024 14:26:30 +0100 Subject: [PATCH] apply comments --- .../stable-video-diffusion/stable-video-diffusion.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb b/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb index a91f46b53a1..b4a52a18ec1 100644 --- a/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb +++ b/notebooks/stable-video-diffusion/stable-video-diffusion.ipynb @@ -22,7 +22,7 @@ "- [Prepare Inference Pipeline](#Prepare-Inference-Pipeline)\n", "- [Run Video Generation](#Run-Video-Generation)\n", " - [Select Inference Device](#Select-Inference-Device)\n", - " - [Quantization](#Quantization)\n", + "- [Quantization](#Quantization)\n", " - [Prepare calibration dataset](#Prepare-calibration-dataset)\n", " - [Run Hybrid Model Quantization](#Run-Hybrid-Model-Quantization)\n", " - [Run Weight Compression](#Run-Weight-Compression)\n", @@ -1094,7 +1094,7 @@ "\n", "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", - "According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use weight compression for the `vae encoder and decoder` to reduce the memory footprint.\n", + "According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use only weight compression for the `vae encoder` and `vae decoder` to reduce the memory footprint.\n", "\n", "For the UNet model we apply quantization in hybrid mode which means that we quantize: (1) weights of MatMul and Embedding layers and (2) activations of other layers. The steps are the following:\n", "\n", @@ -1239,7 +1239,7 @@ "id": "bfdee6ad", "metadata": { "test_replace": { - "subset_size = 200": "subset_size = 4" + "subset_size = 200": "subset_size = 4" } }, "outputs": [], @@ -1355,7 +1355,7 @@ "### Run Weight Compression\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", - "Quantizing of the `vae encoder and decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. The weight compression will be applied to footprint reduction." + "Quantizing of the `vae encoder` and `vae decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. Only weight compression will be applied for footprint reduction." ] }, {