Skip to content

Commit

Permalink
apply comments
Browse files Browse the repository at this point in the history
  • Loading branch information
l-bat committed May 28, 2024
1 parent 22abac4 commit a93e89c
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions notebooks/stable-video-diffusion/stable-video-diffusion.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"- [Prepare Inference Pipeline](#Prepare-Inference-Pipeline)\n",
"- [Run Video Generation](#Run-Video-Generation)\n",
" - [Select Inference Device](#Select-Inference-Device)\n",
" - [Quantization](#Quantization)\n",
"- [Quantization](#Quantization)\n",
" - [Prepare calibration dataset](#Prepare-calibration-dataset)\n",
" - [Run Hybrid Model Quantization](#Run-Hybrid-Model-Quantization)\n",
" - [Run Weight Compression](#Run-Weight-Compression)\n",
Expand Down Expand Up @@ -1094,7 +1094,7 @@
"\n",
"[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n",
"\n",
"According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use weight compression for the `vae encoder and decoder` to reduce the memory footprint.\n",
"According to `OVStableVideoDiffusionPipeline` structure, the diffusion model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use only weight compression for the `vae encoder` and `vae decoder` to reduce the memory footprint.\n",
"\n",
"For the UNet model we apply quantization in hybrid mode which means that we quantize: (1) weights of MatMul and Embedding layers and (2) activations of other layers. The steps are the following:\n",
"\n",
Expand Down Expand Up @@ -1239,7 +1239,7 @@
"id": "bfdee6ad",
"metadata": {
"test_replace": {
"subset_size = 200": "subset_size = 4"
"subset_size = 200": "subset_size = 4"
}
},
"outputs": [],
Expand Down Expand Up @@ -1355,7 +1355,7 @@
"### Run Weight Compression\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Quantizing of the `vae encoder and decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. The weight compression will be applied to footprint reduction."
"Quantizing of the `vae encoder` and `vae decoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy. Only weight compression will be applied for footprint reduction."
]
},
{
Expand Down

0 comments on commit a93e89c

Please sign in to comment.