This repository is the official implementation of the NeurIPS 2024 paper: "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models"
Keywords: Diffusion Model, Exact Inversion, ODE Solver
Fangyikang Wang1, Hubery Yin2, Yuejiang Dong3, Huminhao Zhu1,
Chao Zhang1, Hanbin Zhao1, Hui Qian1, Chen Li21Zhejiang University 2WeChat, Tencent Inc. 3Tsinghua University
Schematic description of DDIM (left) and BELM (right). DDIM uses
$\mathbf{x}_i$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ to calculate$\mathbf{x}_{i-1}$ based on a linear relation between$\mathbf{x}_i$ ,$\mathbf{x}_{i-1}$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ (represented by the blue line). However, DDIM inversion uses$\mathbf{x}_{i-1}$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ to calculate$\mathbf{x}_{i}$ based on a different linear relation represented by the red line. This mismatch leads to the inexact inversion of DDIM. In contrast, BELM seeks to establish a linear relation between$\mathbf{x}_{i-1}$ ,$\mathbf{x}_i$ ,$\mathbf{x}_{i+1}$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i}, i)$ (represented by the green line). BELM and its inversion are derived from this unitary relation, which facilitates the exact inversion. Specifically, BELM uses the linear combination of$\mathbf{x}_i$ ,$\mathbf{x}_{i+1}$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate$\mathbf{x}_{i-1}$ , and the BELM inversion uses the linear combination of$\mathbf{x}_{i-1}$ ,$\mathbf{x}_i$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate$\mathbf{x}_{i+1}$ . The bidirectional explicit constraint means this linear relation does not include the derivatives at the bidirectional endpoint, that is,$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ and$\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i+1},i+1)$ .
the general k-step BELM:
2-step BELM:
Proposition The LTE
$\tau_i$ of BELM diffusion sampler, which is given by$\tau_i = \bar{\mathbf{x}}(t_{i-1}) - a_{i,2}\bar{\mathbf{x}}(t_{i+1}) -a_{i,1}\bar{\mathbf{x}}(t_{i}) - b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}(t_i),\bar{\sigma}_i)$ , can be accurate up to$\mathcal{O}\left({(h_{i}+h_{i+1})}^3\right)$ when formulae are designed as$a_{i,1} = \frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}$ ,$a_{i,2}=\frac{h_i^2}{h_{i+1}^2}$ ,$b_{i,1}=- \frac{h_i+h_{i+1}}{h_{i+1}} $ .
where
the Optimal-BELM (O-BELM) sampler:
The inversion of O-BELM diffusion sampler writes:
- Python 3.8.12
- CUDA 11.7
- NVIDIA A100 40GB PCIe
- Torch 2.0.0
- Torchvision 0.14.0
Please follow diffusers to install diffusers.
first, please switch to the root directory.
python3 ./scripts/cifar10.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
python3 ./scripts/interpolate.py --test_num 10 --batch_size 1 --num_inference_steps 100 --save_dir YOUR/SAVE/DIR --model_id xx
python3 ./scripts/reconstruction.py --test_num 10 --num_inference_steps 100 --directory WHERE/YOUR/IMAGES/ARE --sampler_type belm
python3 ./scripts/image_editing.py --num_inference_steps 200 --freeze_step 50 --guidance 2.0 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxxxx/stable-diffusion-v1-5 --ori_im_path images/imagenet_dog_1.jpg --ori_prompt 'A dog' --res_prompt 'A Dalmatian'
This project is licensed under the MIT License - see the LICENSE file for details.
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@article{wang2024belm,
title={BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models},
author={Wang, Fangyikang and Yin, Hubery and Dong, Yuejiang and Zhu, Huminhao and Zhang, Chao and Zhao, Hanbin and Qian, Hui and Li, Chen},
journal={arXiv preprint arXiv:2410.07273},
year={2024}
}