minor eds to lda/qda

UBC-STAT · Oct 14, 2024 · ea1b16d · ea1b16d
1 parent c54b9e4
commit ea1b16d
Show file tree

Hide file tree

Showing 7 changed files with 402 additions and 406 deletions.
diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json b/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json
@@ -1,7 +1,8 @@
 {
-  "hash": "783e52d7187096a8e0bcf81cd68978d3",
+  "hash": "8a84974e72316aa8240202b718f65637",
   "result": {
-    "markdown": "---\nlecture: \"15 LDA and QDA\"\nformat: revealjs\nmetadata-files: \n  - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 09 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## Last time\n\n\nWe showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0  &  \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere $p_1(X) = Pr(X \\given Y=1)$, $p_0(X) = Pr(X \\given Y=0)$ and $\\pi = Pr(Y=1)$\n\n. . .\n\nFor more than two classes.\n\n$$g_*(X) = \n\\argmax_k \\frac{\\pi_k p_k(X)}{\\sum_k \\pi_k p_k(X)}$$\n\nwhere $p_k(X) = Pr(X \\given Y=k)$ and $\\pi_k = P(Y=k)$\n\n\n## Estimating these\n \nLet's make some assumptions:\n\n1. $Pr(X\\given Y=k) = \\mbox{N}(\\mu_k,\\Sigma_k)$\n2. $\\Sigma_k = \\Sigma_{k'} = \\Sigma$\n\n. . .\n\nThis leads to [Linear Discriminant Analysis]{.secondary} (LDA), one of the oldest classifiers\n\n\n\n## LDA\n\n\n1. Split your training data into $K$ subsets based on $y_i=k$.\n2. In each subset, estimate the mean of $X$: $\\widehat\\mu_k = \\overline{X}_k$\n3. Estimate the pooled variance: $$\\widehat\\Sigma = \\frac{1}{n-K} \\sum_{k \\in \\mathcal{K}} \\sum_{i \\in k} (x_i - \\overline{X}_k) (x_i - \\overline{X}_k)^{\\top}$$\n4. Estimate the class proportion: $\\widehat\\pi_k = n_k/n$\n\n## LDA\n\nAssume just $K = 2$ so $k \\in \\{0,\\ 1\\}$\n\nWe predict $\\widehat{y} = 1$ if\n\n$$\\widehat{p_1}(x) / \\widehat{p_0}(x) > \\widehat{\\pi_0} / \\widehat{\\pi_1}$$ \n\nPlug in the density estimates:\n\n$$\\widehat{p_k}(x) = N(x - \\widehat{\\mu}_k,\\ \\widehat\\Sigma)$$\n\n\n## LDA\n\n\nNow we take $\\log$ and simplify $(K=2)$:\n\n$$\n\\begin{aligned}\n&\\Rightarrow \\log(\\widehat{p_1}(x)\\times\\widehat{\\pi_1}) - \\log(\\widehat{p_0}(x)\\times\\widehat{\\pi_0})\n= \\cdots = \\cdots\\\\\n&= \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1\\right)}_{\\delta_1(x)} -  \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_0-\\frac{1}{2}\\overline X_0^\\top \\widehat\\Sigma^{-1}\\overline X_0 + \\log \\widehat\\pi_0\\right)}_{\\delta_0(x)}\\\\\n&= \\delta_1(x) - \\delta_0(x)\n\\end{aligned}\n$$\n\n\n[If $\\delta_1(x) > \\delta_0(x)$, we set $\\widehat g(x)=1$]{.secondary}\n\n## One dimensional intuition\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(406406406)\nn <- 100\npi <- .6\nmu0 <- -1\nmu1 <- 2\nsigma <- 2\ntib <- tibble(\n  y = rbinom(n, 1, pi),\n  x = rnorm(n, mu0, sigma) * (y == 0) + rnorm(n, mu1, sigma) * (y == 1)\n)\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code  code-fold=\"true\"}\ngg <- ggplot(tib, aes(x, y)) +\n  geom_point(colour = blue) +\n  stat_function(fun = ~ 6 * (1 - pi) * dnorm(.x, mu0, sigma), colour = orange) +\n  stat_function(fun = ~ 6 * pi * dnorm(.x, mu1, sigma), colour = orange) +\n  annotate(\"label\",\n    x = c(-3, 4.5), y = c(.5, 2 / 3),\n    label = c(\"(1-pi)*p[0](x)\", \"pi*p[1](x)\"), parse = TRUE\n  )\ngg\n```\n\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## What is linear?\n\nLook closely at the equation for $\\delta_1(x)$:\n\n$$\\delta_1(x)=x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$$\n\nWe can write this as $\\delta_1(x) = x^\\top a_1 + b_1$ with $a_1 = \\widehat\\Sigma^{-1}\\overline X_1$ and $b_1=-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$.\n\nWe can do the same for $\\delta_0(x)$ (in terms of $a_0$ and $b_0$)\n\nTherefore, \n\n$$\\delta_1(x)-\\delta_0(x) = x^\\top(a_1-a_0) + (b_1-b_0)$$\n\nThis is how we discriminate between the classes.\n\nWe just calculate $(a_1 - a_0)$ (a vector in $\\R^p$), and $b_1 - b_0$ (a scalar)\n\n\n## Baby example\n\n::: flex\n::: w-50\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(mvtnorm)\nlibrary(MASS)\ngenerate_lda_2d <- function(\n    n, p = c(.5, .5), \n    mu = matrix(c(0, 0, 1, 1), 2),\n    Sigma = diag(2)) {\n  X <- rmvnorm(n, sigma = Sigma)\n  tibble(\n    y = which(rmultinom(n, 1, p) == 1, TRUE)[,1],\n    x1 = X[, 1] + mu[1, y],\n    x2 = X[, 2] + mu[2, y]\n  )\n}\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlda_fit <- lda(y ~ ., dat1)\n```\n:::\n\n\n:::\n::: w-50\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/plot-d1-1.png){fig-align='center'}\n:::\n:::\n\n\n:::\n\n:::\n\n\n## Multiple classes\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmoreclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(0, 0, 1, 1, 1, 0), 2), .5 * diag(2))\nseparateclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(-1, -1, 2, 2, 2, -1), 2), .1 * diag(2))\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-plot-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n## QDA\n\nJust like LDA, but $\\Sigma_k$ is separate for each class.\n\nProduces [Quadratic]{.secondary} decision boundary.\n\nEverything else is the same.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nqda_fit <- qda(y ~ ., dat1)\nqda_3fit <- qda(y ~ ., moreclasses)\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/qda-vs-lda-2class-1.png){fig-align='center'}\n:::\n:::\n\n\n\n## 3 class comparison\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-comparison-1.png){fig-align='center'}\n:::\n:::\n\n\n\n## Notes\n\n* LDA is a linear classifier. It is not a linear smoother.\n  - It is derived from Bayes rule.\n  - Assume each class-conditional density in Gaussian\n  - It assumes the classes have different mean vectors, but the same (common) covariance matrix.\n  - It estimates densities and probabilities and \"plugs in\" \n\n* QDA is not a linear classifier. It depends on quadratic functions of the data.\n  - It is derived from Bayes rule.\n  - Assume each class-conditional density in Gaussian\n  - It assumes the classes have different mean vectors and different covariance matrices.\n  - It estimates densities and probabilities and \"plugs in\" \n  \n##\n\n[It is hard (maybe impossible) to come up with reasonable classifiers that are linear smoothers. Many \"look\" like a linear smoother, but then apply a nonlinear transformation.]{.hand}\n\n## Naïve Bayes\n\nAssume that $Pr(X | Y = k) = Pr(X_1 | Y = k)\\cdots Pr(X_p | Y = k)$.\n\nThat is, conditional on the class, the feature distribution is independent.\n\n. . .\n\nIf we further assume that $Pr(X_j | Y = k)$ is Gaussian,\n\nThis is the same as QDA but with $\\Sigma_k$ Diagonal.\n\n. . .\n\nDon't have to assume Gaussian. Could do lots of stuff. \n\n\n# Next time...\n\nAnother linear classifier and transformations\n",
+    "engine": "knitr",
+    "markdown": "---\nlecture: \"15 LDA and QDA\"\nformat: revealjs\nmetadata-files: \n  - _metadata.yml\n---\n\n\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 14 October 2024\n\n\n\n\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## Last time\n\n\nWe showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(x) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(x)}{p_0(x)} > \\frac{1-\\pi}{\\pi} \\\\\n0  &  \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere $p_1(x) = \\Pr(X=x \\given Y=1)$, $p_0(x) = \\Pr(X=x \\given Y=0)$ and $\\pi = \\Pr(Y=1)$\n\n. . .\n\nFor more than two classes:\n\n$$g_*(x) = \n\\argmax_k \\frac{\\pi_k p_k(x)}{\\sum_k \\pi_k p_k(x)}$$\n\nwhere $p_k(x) = \\Pr(X=x \\given Y=k)$ and $\\pi_k = P(Y=k)$\n\n\n## Estimating these\n \nLet's make some assumptions:\n\n1. $\\Pr(X=x\\given Y=k) = \\mbox{N}(x; \\mu_k,\\Sigma_k)$\n2. $\\Sigma_k = \\Sigma_{k'} = \\Sigma$\n\nThis leads to [Linear Discriminant Analysis]{.secondary} (LDA), one of the oldest classifiers\n\n## LDA\n\n\n1. Split your training data into $K$ subsets based on $y_i=k$.\n2. In each subset, estimate the mean of $X$: $\\widehat\\mu_k = \\overline{X}_k$\n3. Estimate the pooled variance: $$\\widehat\\Sigma = \\frac{1}{n-K} \\sum_{k \\in \\mathcal{K}} \\sum_{i \\in k} (x_i - \\overline{X}_k) (x_i - \\overline{X}_k)^{\\top}$$\n4. Estimate the class proportion: $\\widehat\\pi_k = n_k/n$\n\n## LDA\n\nAssume just $K = 2$ so $k \\in \\{0,\\ 1\\}$\n\nWe predict $\\widehat{y} = 1$ if\n\n$$\\widehat{p_1}(x) / \\widehat{p_0}(x) > \\widehat{\\pi_0} / \\widehat{\\pi_1}$$ \n\nPlug in the density estimates:\n\n$$\\widehat{p_k}(x) = N(x - \\widehat{\\mu}_k,\\ \\widehat\\Sigma)$$\n\n\n## LDA\n\n\nNow we take $\\log$ and simplify $(K=2)$:\n\n$$\n\\begin{aligned}\n&\\Rightarrow \\log(\\widehat{p_1}(x)\\times\\widehat{\\pi_1}) - \\log(\\widehat{p_0}(x)\\times\\widehat{\\pi_0})\n= \\cdots = \\cdots\\\\\n&= \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1\\right)}_{\\delta_1(x)} -  \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_0-\\frac{1}{2}\\overline X_0^\\top \\widehat\\Sigma^{-1}\\overline X_0 + \\log \\widehat\\pi_0\\right)}_{\\delta_0(x)}\\\\\n&= \\delta_1(x) - \\delta_0(x)\n\\end{aligned}\n$$\n\n\n[If $\\delta_1(x) > \\delta_0(x)$, we set $\\widehat g(x)=1$]{.secondary}\n\n## One dimensional intuition\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(406406406)\nn <- 100\npi <- .6\nmu0 <- -1\nmu1 <- 2\nsigma <- 2\ntib <- tibble(\n  y = rbinom(n, 1, pi),\n  x = rnorm(n, mu0, sigma) * (y == 0) + rnorm(n, mu1, sigma) * (y == 1)\n)\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code  code-fold=\"true\"}\ngg <- ggplot(tib, aes(x, y)) +\n  geom_point(colour = blue) +\n  stat_function(fun = ~ 6 * (1 - pi) * dnorm(.x, mu0, sigma), colour = orange) +\n  stat_function(fun = ~ 6 * pi * dnorm(.x, mu1, sigma), colour = orange) +\n  annotate(\"label\",\n    x = c(-3, 4.5), y = c(.5, 2 / 3),\n    label = c(\"(1-pi)*p[0](x)\", \"pi*p[1](x)\"), parse = TRUE\n  )\ngg\n```\n\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## What is linear?\n\nLook closely at the equation for $\\delta_1(x)$:\n\n$$\\delta_1(x)=x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$$\n\nWe can write this as $\\delta_1(x) = x^\\top a_1 + b_1$ with $a_1 = \\widehat\\Sigma^{-1}\\overline X_1$ and $b_1=-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$.\n\nWe can do the same for $\\delta_0(x)$ (in terms of $a_0$ and $b_0$)\n\nTherefore, \n\n$$\\delta_1(x)-\\delta_0(x) = x^\\top(a_1-a_0) + (b_1-b_0)$$\n\nThis is how we discriminate between the classes.\n\nWe just calculate $(a_1 - a_0)$ (a vector in $\\R^p$), and $b_1 - b_0$ (a scalar)\n\n\n## Baby example\n\n::: flex\n::: w-50\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(mvtnorm)\nlibrary(MASS)\ngenerate_lda_2d <- function(\n    n, p = c(.5, .5), \n    mu = matrix(c(0, 0, 1, 1), 2),\n    Sigma = diag(2)) {\n  X <- rmvnorm(n, sigma = Sigma)\n  tibble(\n    y = which(rmultinom(n, 1, p) == 1, TRUE)[,1],\n    x1 = X[, 1] + mu[1, y],\n    x2 = X[, 2] + mu[2, y]\n  )\n}\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlda_fit <- lda(y ~ ., dat1)\n```\n:::\n\n\n\n\n:::\n::: w-50\n\n\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/plot-d1-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n:::\n\n:::\n\n\n## Multiple classes\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmoreclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(0, 0, 1, 1, 1, 0), 2), .5 * diag(2))\nseparateclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(-1, -1, 2, 2, 2, -1), 2), .1 * diag(2))\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-plot-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## QDA\n\nJust like LDA, but $\\Sigma_k$ is separate for each class.\n\nProduces [Quadratic]{.secondary} decision boundary.\n\nEverything else is the same.\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nqda_fit <- qda(y ~ ., dat1)\nqda_3fit <- qda(y ~ ., moreclasses)\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/qda-vs-lda-2class-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n## 3 class comparison\n\n\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-comparison-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n## Notes\n\n* LDA is a linear classifier. It is not a linear smoother.\n  - It is derived from Bayes rule.\n  - Assume each class-conditional density in Gaussian\n  - It assumes the classes have different mean vectors, but the same (common) covariance matrix.\n  - It estimates densities and probabilities and \"plugs in\" \n\n* QDA is not a linear classifier. It depends on quadratic functions of the data.\n  - It is derived from Bayes rule.\n  - Assume each class-conditional density in Gaussian\n  - It assumes the classes have different mean vectors and different covariance matrices.\n  - It estimates densities and probabilities and \"plugs in\" \n  \n##\n\n[It is hard (maybe impossible) to come up with reasonable classifiers that are linear smoothers. Many \"look\" like a linear smoother, but then apply a nonlinear transformation.]{.hand}\n\n## Naïve Bayes\n\nAssume that $\\Pr(X=x | Y = k) = \\Pr(X_1=x_1 | Y = k)\\cdots \\Pr(X_p=x_p | Y = k)$.\n\nThat is, conditional on the class, the feature distribution is independent.\n\n. . .\n\nIf we further assume that $\\Pr(X_j=x_j | Y = k)$ is Gaussian,\n\nThis is the same as QDA but with $\\Sigma_k$ Diagonal.\n\n. . .\n\nDon't have to assume Gaussian. Could do lots of stuff. \n\n\n# Next time...\n\nAnother linear classifier and transformations\n",
     "supporting": [
       "15-LDA-and-QDA_files"
     ],

diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png
diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png
diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png
diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png