diff --git a/homeworks/hw2/hw2.ipynb b/homeworks/hw2/hw2.ipynb index a16630d..df2d90a 100644 --- a/homeworks/hw2/hw2.ipynb +++ b/homeworks/hw2/hw2.ipynb @@ -465,7 +465,7 @@ "\n", "Implement a hierarchical VAE that follows the following structure.\n", "* $z1$ is a 2x2x12 latent vector where p(z1) is the unit Gaussian.\n", - " * Learn the approximate posterior $q_\\theta(z|x) = N(z; \\mu_\\theta(x), \\Sigma_\\theta(x))$, where $\\mu_\\theta(x)$ is the mean vector, and $\\Sigma_\\theta(x)$ is a diagonal covariance matrix. I.e., same as a normal VAE, but use a matrix latent rather than a vector. Each dimension is independent.\n", + " * Learn the approximate posterior $q_\\theta(z1|x) = N(z1; \\mu_\\theta(x), \\Sigma_\\theta(x))$, where $\\mu_\\theta(x)$ is the mean vector, and $\\Sigma_\\theta(x)$ is a diagonal covariance matrix. I.e., same as a normal VAE, but use a matrix latent rather than a vector. Each dimension is independent.\n", "* $z2$ is a 2x2x12 latent vector.\n", " * $p_\\theta(z2|z1)$ is learned, and implemented as a neural network that parameterizes mean (and log std, optionally).\n", " * $q_\\theta(z2|z1,x)$ is also learned. Implement this as a Residual Normal [see NVAE] over the prior $p_\\theta(z2|z1)$.\n", @@ -473,7 +473,7 @@ "\n", "Some helpful hints:\n", "* Two KL losses should be calculated. The first should match $q_\\theta(z|x)$ to the unit Gaussian. The second should match $q_\\theta(z2|z1,x)$ and $p_\\theta(z2|z1)$, and be taken with respect to $q$.\n", - "* When calculating the second KL term, utilize the analytic form for the residual normal. When $q_\\theta(z2|z1,x) = N(z2; \\mu_\\theta(z1) + \\Delta \\mu_\\theta(z1,x), \\Sigma_\\theta(z1)) * \\Delta \\Sigma_\\theta(z1,x))$, use the following form: `kl_z2 = -z2_residual_logstd - 0.5 + (torch.exp(2 * z2_residual_logstd) + z2_residual_mu ** 2) * 0.5`\n", + "* When calculating the second KL term, utilize the analytic form for the residual normal. When $q_\\theta(z2|z1,x) = N(z2; \\mu_\\theta(z1) + \\Delta \\mu_\\theta(z1,x), \\Sigma_\\theta(z1) * \\Delta \\Sigma_\\theta(z1,x))$, use the following form: `kl_z2 = -z2_residual_logstd - 0.5 + (torch.exp(2 * z2_residual_logstd) + z2_residual_mu ** 2) * 0.5`\n", "* When calculating KL, remember to sum over the dimensions of the latent variable before taking the mean over batch.\n", "* For the prior $p_\\theta(z2|z1)$, fix standard deviation to be 1. Learn only the mean. This will help with stability in training.\n", "\n", diff --git a/homeworks/hw2/hw2_latex/main.tex b/homeworks/hw2/hw2_latex/main.tex index 876d867..b6a9d83 100644 --- a/homeworks/hw2/hw2_latex/main.tex +++ b/homeworks/hw2/hw2_latex/main.tex @@ -318,7 +318,7 @@ \newpage - Final VQ-VAE Test Loss: \textcolor{red}{FILL}, PixelCNN Prior Test Los: \textcolor{red}{FILL} (Dataset 2) + Final VQ-VAE Test Loss: \textcolor{red}{FILL}, Transformer Prior Test Los: \textcolor{red}{FILL} (Dataset 2) \begin{figure}[H] \centering \begin{subfigure}[b]{0.475\textwidth}