diff --git a/homeworks/hw2/hw2.ipynb b/homeworks/hw2/hw2.ipynb
index a16630d..df2d90a 100644
--- a/homeworks/hw2/hw2.ipynb
+++ b/homeworks/hw2/hw2.ipynb
@@ -465,7 +465,7 @@
         "\n",
         "Implement a hierarchical VAE that follows the following structure.\n",
         "* $z1$ is a 2x2x12 latent vector where p(z1) is the unit Gaussian.\n",
-        "    * Learn the approximate posterior $q_\\theta(z|x) = N(z; \\mu_\\theta(x), \\Sigma_\\theta(x))$, where $\\mu_\\theta(x)$ is the mean vector, and $\\Sigma_\\theta(x)$ is a diagonal covariance matrix. I.e., same as a normal VAE, but use a matrix latent rather than a vector. Each dimension is independent.\n",
+        "    * Learn the approximate posterior $q_\\theta(z1|x) = N(z1; \\mu_\\theta(x), \\Sigma_\\theta(x))$, where $\\mu_\\theta(x)$ is the mean vector, and $\\Sigma_\\theta(x)$ is a diagonal covariance matrix. I.e., same as a normal VAE, but use a matrix latent rather than a vector. Each dimension is independent.\n",
         "* $z2$ is a 2x2x12 latent vector.\n",
         "    * $p_\\theta(z2|z1)$ is learned, and implemented as a neural network that parameterizes mean (and log std, optionally).\n",
         "    * $q_\\theta(z2|z1,x)$ is also learned. Implement this as a Residual Normal [see NVAE] over the prior $p_\\theta(z2|z1)$.\n",
@@ -473,7 +473,7 @@
         "\n",
         "Some helpful hints:\n",
         "* Two KL losses should be calculated. The first should match $q_\\theta(z|x)$ to the unit Gaussian. The second should match $q_\\theta(z2|z1,x)$ and $p_\\theta(z2|z1)$, and be taken with respect to $q$.\n",
-        "* When calculating the second KL term, utilize the analytic form for the residual normal. When $q_\\theta(z2|z1,x) = N(z2; \\mu_\\theta(z1) + \\Delta \\mu_\\theta(z1,x), \\Sigma_\\theta(z1)) * \\Delta \\Sigma_\\theta(z1,x))$, use the following form: `kl_z2 = -z2_residual_logstd - 0.5 + (torch.exp(2 * z2_residual_logstd) + z2_residual_mu ** 2) * 0.5`\n",
+        "* When calculating the second KL term, utilize the analytic form for the residual normal. When $q_\\theta(z2|z1,x) = N(z2; \\mu_\\theta(z1) + \\Delta \\mu_\\theta(z1,x), \\Sigma_\\theta(z1) * \\Delta \\Sigma_\\theta(z1,x))$, use the following form: `kl_z2 = -z2_residual_logstd - 0.5 + (torch.exp(2 * z2_residual_logstd) + z2_residual_mu ** 2) * 0.5`\n",
         "* When calculating KL, remember to sum over the dimensions of the latent variable before taking the mean over batch.\n",
         "* For the prior $p_\\theta(z2|z1)$, fix standard deviation to be 1. Learn only the mean. This will help with stability in training.\n",
         "\n",
diff --git a/homeworks/hw2/hw2_latex/main.tex b/homeworks/hw2/hw2_latex/main.tex
index 876d867..b6a9d83 100644
--- a/homeworks/hw2/hw2_latex/main.tex
+++ b/homeworks/hw2/hw2_latex/main.tex
@@ -318,7 +318,7 @@
 
    \newpage
 
-   Final VQ-VAE Test Loss: \textcolor{red}{FILL}, PixelCNN Prior Test Los: \textcolor{red}{FILL} (Dataset 2)
+   Final VQ-VAE Test Loss: \textcolor{red}{FILL}, Transformer Prior Test Los: \textcolor{red}{FILL} (Dataset 2)
    \begin{figure}[H]
           \centering
           \begin{subfigure}[b]{0.475\textwidth}