From e22fe9637f911a93dece75f446c5ad8a14158579 Mon Sep 17 00:00:00 2001 From: RomainDeleat Date: Thu, 10 Oct 2024 11:12:35 +0200 Subject: [PATCH] Minor corrections + remove a small part. --- .../_posts/2024-10-07-Diffusion_Autoencoders.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/collections/_posts/2024-10-07-Diffusion_Autoencoders.md b/collections/_posts/2024-10-07-Diffusion_Autoencoders.md index 324a13e8..a48cb203 100755 --- a/collections/_posts/2024-10-07-Diffusion_Autoencoders.md +++ b/collections/_posts/2024-10-07-Diffusion_Autoencoders.md @@ -23,13 +23,13 @@ pdf: "https://openaccess.thecvf.com/content/CVPR2022/html/Preechakul_Diffusion_A * Autoencoders are useful for learning representations. On the contrary, DPM models can transform an input image into a latent variable yet they lack key features like semantics and disentanglement. -* Their proposed approach uses a learnable encoder to capture high-level semantics and a DPM for decoding and modeling variations. +* The proposed approach uses a learnable encoder to capture high-level semantics and a DPM for decoding and modeling variations. -* Unlike other DPMs, DDIM introduces a non-Markovian forward process while preserving DPM training objectives, enabling deterministic encoding. +* Unlike other DPMs, DDIM introduces a non-Markovian reverse process while preserving DPM training objectives. * Conditioning DDIM on semantic information improves denoising efficiency and produces a linear, decodable, semantically meaningful representation. Also, due to the conditioning, the denoising becomes easier and faster. -* To generate synthetic data, the authors used another DPM for the semantic subcode distribution. +* To generate unconditional synthetic data, the authors used another DPM for the semantic subcode distribution. # Methods @@ -81,6 +81,8 @@ The conditional DDIM decoder proposed by the authors takes as input the pair $$ to generate the output image. This decoder models $$ p_{\theta}(x_{t-1} | x_t, z_{\text{sem}}) $$ to approximate the inference distribution $$ q(x_{t-1} | x_t, x_0) $$ using the following reverse generative process: +* Decoder: + $$ p_\theta(x_{0:T} | z_{\text{sem}}) = p(x_T) \prod_{t=1}^{T} p_\theta(x_{t-1} | x_t, z_{\text{sem}}) $$ $$ @@ -93,6 +95,8 @@ $$ $$ f_\theta(x_t, t, z_{\text{sem}}) = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \sqrt{1 - \alpha_t} \epsilon_\theta(x_t, t, z_{\text{sem}}) \right) $$ +* Training objective: + $$ L_{\text{simple}} = \sum_{t=1}^{T} \mathbb{E}_{x_0, \epsilon_t} \left[ \left\| \epsilon_\theta(x_t, t, z_{\text{sem}}) - \epsilon_t \right\|_2^2 \right] $$ $$ \text{where } \epsilon_t \in \mathbb{R}^{3 \times h \times w} \sim \mathcal{N}(0, I), \quad x_t = \sqrt{\alpha_t} x_0 + \sqrt{1 - \alpha_t} \epsilon_t $$ @@ -108,13 +112,13 @@ where $$ z_s \in \mathbb{R}^c = \text{Affine}(z_{\text{sem}}) $$ and $$ (t_s, t_b) \in \mathbb{R}^{2 \times c} = \text{MLP}(\psi(t)) $$ is the output of a multilayer perceptron with a sinusoidal encoding function $$ \psi $$. -## Stochastic encoder +