Skip to content

Commit

Permalink
StableCascade image generation (#1979)
Browse files Browse the repository at this point in the history
CVS-139009
  • Loading branch information
aleksandr-mokrov authored May 1, 2024
1 parent 863dc2d commit b3ef8e9
Show file tree
Hide file tree
Showing 7 changed files with 859 additions and 0 deletions.
1 change: 1 addition & 0 deletions .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,4 @@ llm-rag-langchain
stable-video-diffusion
llm-agent-langchain
hello-npu
stable-cascade-image-generation
1 change: 1 addition & 0 deletions .ci/ignore_treon_linux.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,4 @@ llm-rag-langchain
stable-video-diffusion
llm-agent-langchain
hello-npu
stable-cascade-image-generation
1 change: 1 addition & 0 deletions .ci/ignore_treon_mac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,4 @@ llm-rag-langchain
stable-video-diffusion
llm-agent-langchain
hello-npu
stable-cascade-image-generation
1 change: 1 addition & 0 deletions .ci/ignore_treon_win.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,4 @@ llm-rag-langchain
stable-video-diffusion
llm-agent-langchain
hello-npu
stable-cascade-image-generation
1 change: 1 addition & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,7 @@ SRT
SSD
SSDLite
sst
StableCascade
StableDiffusionInpaintPipeline
StableDiffusionPipeline
StableDiffusionImg
Expand Down
27 changes: 27 additions & 0 deletions notebooks/stable-cascade-image-generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Image generation with StableCascade and OpenVINO

<img src="https://huggingface.co/stabilityai/stable-cascade/resolve/main/figures/collage_1.jpg" />

[Stable Cascade](https://huggingface.co/stabilityai/stable-cascade) is built upon the [Würstchen](https://openreview.net/forum?id=gU58d5QeGv) architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes. How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a 1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the highly compressed latent space.

The notebook provides a simple interface that allows communication with a model using text instruction. In this demonstration user can provide input instructions and the model generates an image. An additional part demonstrates how to use weights compression with [NNCF](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html#compress-model-weights) to speed up pipeline and reduce memory consumption.

>**Note**: This demonstration can require about 50GB RAM for conversion and running.
## Notebook contents
This tutorial consists of the following steps:
- Prerequisites
- Load the original model
- Infer the original model
- Convert the model to OpenVINO IR
- Prior pipeline
- Decoder pipeline
- Compiling models
- Building the pipeline
- Inference
- Interactive inference

## Installation instructions
This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

Large diffs are not rendered by default.

0 comments on commit b3ef8e9

Please sign in to comment.