Skip to content

Commit

Permalink
Minor corrections + remove a small part.
Browse files Browse the repository at this point in the history
  • Loading branch information
RomainDeleat-besson committed Oct 10, 2024
1 parent b47d1d4 commit e22fe96
Showing 1 changed file with 9 additions and 5 deletions.
14 changes: 9 additions & 5 deletions collections/_posts/2024-10-07-Diffusion_Autoencoders.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ pdf: "https://openaccess.thecvf.com/content/CVPR2022/html/Preechakul_Diffusion_A

* Autoencoders are useful for learning representations. On the contrary, DPM models can transform an input image into a latent variable yet they lack key features like semantics and disentanglement.

* Their proposed approach uses a learnable encoder to capture high-level semantics and a DPM for decoding and modeling variations.
* The proposed approach uses a learnable encoder to capture high-level semantics and a DPM for decoding and modeling variations.

* Unlike other DPMs, DDIM introduces a non-Markovian forward process while preserving DPM training objectives, enabling deterministic encoding.
* Unlike other DPMs, DDIM introduces a non-Markovian reverse process while preserving DPM training objectives.

* Conditioning DDIM on semantic information improves denoising efficiency and produces a linear, decodable, semantically meaningful representation. Also, due to the conditioning, the denoising becomes easier and faster.

* To generate synthetic data, the authors used another DPM for the semantic subcode distribution.
* To generate unconditional synthetic data, the authors used another DPM for the semantic subcode distribution.


# Methods
Expand Down Expand Up @@ -81,6 +81,8 @@ The conditional DDIM decoder proposed by the authors takes as input the pair $$
to generate the output image.
This decoder models $$ p_{\theta}(x_{t-1} | x_t, z_{\text{sem}}) $$ to approximate the inference distribution $$ q(x_{t-1} | x_t, x_0) $$ using the following reverse generative process:

* Decoder:

$$ p_\theta(x_{0:T} | z_{\text{sem}}) = p(x_T) \prod_{t=1}^{T} p_\theta(x_{t-1} | x_t, z_{\text{sem}}) $$

$$
Expand All @@ -93,6 +95,8 @@ $$

$$ f_\theta(x_t, t, z_{\text{sem}}) = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \sqrt{1 - \alpha_t} \epsilon_\theta(x_t, t, z_{\text{sem}}) \right) $$

* Training objective:

$$ L_{\text{simple}} = \sum_{t=1}^{T} \mathbb{E}_{x_0, \epsilon_t} \left[ \left\| \epsilon_\theta(x_t, t, z_{\text{sem}}) - \epsilon_t \right\|_2^2 \right] $$

$$ \text{where } \epsilon_t \in \mathbb{R}^{3 \times h \times w} \sim \mathcal{N}(0, I), \quad x_t = \sqrt{\alpha_t} x_0 + \sqrt{1 - \alpha_t} \epsilon_t $$
Expand All @@ -108,13 +112,13 @@ where $$ z_s \in \mathbb{R}^c = \text{Affine}(z_{\text{sem}}) $$
and $$ (t_s, t_b) \in \mathbb{R}^{2 \times c} = \text{MLP}(\psi(t)) $$ is the output of a multilayer perceptron with a sinusoidal encoding function $$ \psi $$.


## Stochastic encoder
<!-- ## Stochastic encoder
Using their conditional DDIM, they can encode an input image $$ x_0 $$ into the stochastic subcode $$ x_T $$ by reversing the deterministic generative process with the following equation:
$$ \mathbf{x}_{t+1} = \sqrt{\alpha_{t+1}} f_\theta(\mathbf{x}_t, t, z_{\text{sem}}) + \sqrt{1 - \alpha_{t+1}} \epsilon_\theta(\mathbf{x}_t, t, z_{\text{sem}}) $$
> Note that the stochastic encoder is not used during training and is used to compute $$ x_T $$ for tasks that require exact reconstruction or inversion, such as real-image manipulation.
> Note that the stochastic encoder is not used during training and is used to compute $$ x_T $$ for tasks that require exact reconstruction or inversion, such as real-image manipulation. -->

<div style="text-align:center"><img src="/collections/images/DiffusionAutoencoders/Fig2.jpg" width=1500></div>

Expand Down

0 comments on commit e22fe96

Please sign in to comment.