diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json b/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json index a76695d..586c034 100644 --- a/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json +++ b/_freeze/schedule/slides/15-LDA-and-QDA/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "783e52d7187096a8e0bcf81cd68978d3", + "hash": "8a84974e72316aa8240202b718f65637", "result": { - "markdown": "---\nlecture: \"15 LDA and QDA\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 09 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## Last time\n\n\nWe showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere $p_1(X) = Pr(X \\given Y=1)$, $p_0(X) = Pr(X \\given Y=0)$ and $\\pi = Pr(Y=1)$\n\n. . .\n\nFor more than two classes.\n\n$$g_*(X) = \n\\argmax_k \\frac{\\pi_k p_k(X)}{\\sum_k \\pi_k p_k(X)}$$\n\nwhere $p_k(X) = Pr(X \\given Y=k)$ and $\\pi_k = P(Y=k)$\n\n\n## Estimating these\n \nLet's make some assumptions:\n\n1. $Pr(X\\given Y=k) = \\mbox{N}(\\mu_k,\\Sigma_k)$\n2. $\\Sigma_k = \\Sigma_{k'} = \\Sigma$\n\n. . .\n\nThis leads to [Linear Discriminant Analysis]{.secondary} (LDA), one of the oldest classifiers\n\n\n\n## LDA\n\n\n1. Split your training data into $K$ subsets based on $y_i=k$.\n2. In each subset, estimate the mean of $X$: $\\widehat\\mu_k = \\overline{X}_k$\n3. Estimate the pooled variance: $$\\widehat\\Sigma = \\frac{1}{n-K} \\sum_{k \\in \\mathcal{K}} \\sum_{i \\in k} (x_i - \\overline{X}_k) (x_i - \\overline{X}_k)^{\\top}$$\n4. Estimate the class proportion: $\\widehat\\pi_k = n_k/n$\n\n## LDA\n\nAssume just $K = 2$ so $k \\in \\{0,\\ 1\\}$\n\nWe predict $\\widehat{y} = 1$ if\n\n$$\\widehat{p_1}(x) / \\widehat{p_0}(x) > \\widehat{\\pi_0} / \\widehat{\\pi_1}$$ \n\nPlug in the density estimates:\n\n$$\\widehat{p_k}(x) = N(x - \\widehat{\\mu}_k,\\ \\widehat\\Sigma)$$\n\n\n## LDA\n\n\nNow we take $\\log$ and simplify $(K=2)$:\n\n$$\n\\begin{aligned}\n&\\Rightarrow \\log(\\widehat{p_1}(x)\\times\\widehat{\\pi_1}) - \\log(\\widehat{p_0}(x)\\times\\widehat{\\pi_0})\n= \\cdots = \\cdots\\\\\n&= \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1\\right)}_{\\delta_1(x)} - \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_0-\\frac{1}{2}\\overline X_0^\\top \\widehat\\Sigma^{-1}\\overline X_0 + \\log \\widehat\\pi_0\\right)}_{\\delta_0(x)}\\\\\n&= \\delta_1(x) - \\delta_0(x)\n\\end{aligned}\n$$\n\n\n[If $\\delta_1(x) > \\delta_0(x)$, we set $\\widehat g(x)=1$]{.secondary}\n\n## One dimensional intuition\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(406406406)\nn <- 100\npi <- .6\nmu0 <- -1\nmu1 <- 2\nsigma <- 2\ntib <- tibble(\n y = rbinom(n, 1, pi),\n x = rnorm(n, mu0, sigma) * (y == 0) + rnorm(n, mu1, sigma) * (y == 1)\n)\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ngg <- ggplot(tib, aes(x, y)) +\n geom_point(colour = blue) +\n stat_function(fun = ~ 6 * (1 - pi) * dnorm(.x, mu0, sigma), colour = orange) +\n stat_function(fun = ~ 6 * pi * dnorm(.x, mu1, sigma), colour = orange) +\n annotate(\"label\",\n x = c(-3, 4.5), y = c(.5, 2 / 3),\n label = c(\"(1-pi)*p[0](x)\", \"pi*p[1](x)\"), parse = TRUE\n )\ngg\n```\n\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## What is linear?\n\nLook closely at the equation for $\\delta_1(x)$:\n\n$$\\delta_1(x)=x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$$\n\nWe can write this as $\\delta_1(x) = x^\\top a_1 + b_1$ with $a_1 = \\widehat\\Sigma^{-1}\\overline X_1$ and $b_1=-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$.\n\nWe can do the same for $\\delta_0(x)$ (in terms of $a_0$ and $b_0$)\n\nTherefore, \n\n$$\\delta_1(x)-\\delta_0(x) = x^\\top(a_1-a_0) + (b_1-b_0)$$\n\nThis is how we discriminate between the classes.\n\nWe just calculate $(a_1 - a_0)$ (a vector in $\\R^p$), and $b_1 - b_0$ (a scalar)\n\n\n## Baby example\n\n::: flex\n::: w-50\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(mvtnorm)\nlibrary(MASS)\ngenerate_lda_2d <- function(\n n, p = c(.5, .5), \n mu = matrix(c(0, 0, 1, 1), 2),\n Sigma = diag(2)) {\n X <- rmvnorm(n, sigma = Sigma)\n tibble(\n y = which(rmultinom(n, 1, p) == 1, TRUE)[,1],\n x1 = X[, 1] + mu[1, y],\n x2 = X[, 2] + mu[2, y]\n )\n}\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlda_fit <- lda(y ~ ., dat1)\n```\n:::\n\n\n:::\n::: w-50\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/plot-d1-1.png){fig-align='center'}\n:::\n:::\n\n\n:::\n\n:::\n\n\n## Multiple classes\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmoreclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(0, 0, 1, 1, 1, 0), 2), .5 * diag(2))\nseparateclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(-1, -1, 2, 2, 2, -1), 2), .1 * diag(2))\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-plot-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n## QDA\n\nJust like LDA, but $\\Sigma_k$ is separate for each class.\n\nProduces [Quadratic]{.secondary} decision boundary.\n\nEverything else is the same.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nqda_fit <- qda(y ~ ., dat1)\nqda_3fit <- qda(y ~ ., moreclasses)\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/qda-vs-lda-2class-1.png){fig-align='center'}\n:::\n:::\n\n\n\n## 3 class comparison\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-comparison-1.png){fig-align='center'}\n:::\n:::\n\n\n\n## Notes\n\n* LDA is a linear classifier. It is not a linear smoother.\n - It is derived from Bayes rule.\n - Assume each class-conditional density in Gaussian\n - It assumes the classes have different mean vectors, but the same (common) covariance matrix.\n - It estimates densities and probabilities and \"plugs in\" \n\n* QDA is not a linear classifier. It depends on quadratic functions of the data.\n - It is derived from Bayes rule.\n - Assume each class-conditional density in Gaussian\n - It assumes the classes have different mean vectors and different covariance matrices.\n - It estimates densities and probabilities and \"plugs in\" \n \n##\n\n[It is hard (maybe impossible) to come up with reasonable classifiers that are linear smoothers. Many \"look\" like a linear smoother, but then apply a nonlinear transformation.]{.hand}\n\n## Naïve Bayes\n\nAssume that $Pr(X | Y = k) = Pr(X_1 | Y = k)\\cdots Pr(X_p | Y = k)$.\n\nThat is, conditional on the class, the feature distribution is independent.\n\n. . .\n\nIf we further assume that $Pr(X_j | Y = k)$ is Gaussian,\n\nThis is the same as QDA but with $\\Sigma_k$ Diagonal.\n\n. . .\n\nDon't have to assume Gaussian. Could do lots of stuff. \n\n\n# Next time...\n\nAnother linear classifier and transformations\n", + "engine": "knitr", + "markdown": "---\nlecture: \"15 LDA and QDA\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n\n\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 14 October 2024\n\n\n\n\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## Last time\n\n\nWe showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(x) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(x)}{p_0(x)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere $p_1(x) = \\Pr(X=x \\given Y=1)$, $p_0(x) = \\Pr(X=x \\given Y=0)$ and $\\pi = \\Pr(Y=1)$\n\n. . .\n\nFor more than two classes:\n\n$$g_*(x) = \n\\argmax_k \\frac{\\pi_k p_k(x)}{\\sum_k \\pi_k p_k(x)}$$\n\nwhere $p_k(x) = \\Pr(X=x \\given Y=k)$ and $\\pi_k = P(Y=k)$\n\n\n## Estimating these\n \nLet's make some assumptions:\n\n1. $\\Pr(X=x\\given Y=k) = \\mbox{N}(x; \\mu_k,\\Sigma_k)$\n2. $\\Sigma_k = \\Sigma_{k'} = \\Sigma$\n\nThis leads to [Linear Discriminant Analysis]{.secondary} (LDA), one of the oldest classifiers\n\n## LDA\n\n\n1. Split your training data into $K$ subsets based on $y_i=k$.\n2. In each subset, estimate the mean of $X$: $\\widehat\\mu_k = \\overline{X}_k$\n3. Estimate the pooled variance: $$\\widehat\\Sigma = \\frac{1}{n-K} \\sum_{k \\in \\mathcal{K}} \\sum_{i \\in k} (x_i - \\overline{X}_k) (x_i - \\overline{X}_k)^{\\top}$$\n4. Estimate the class proportion: $\\widehat\\pi_k = n_k/n$\n\n## LDA\n\nAssume just $K = 2$ so $k \\in \\{0,\\ 1\\}$\n\nWe predict $\\widehat{y} = 1$ if\n\n$$\\widehat{p_1}(x) / \\widehat{p_0}(x) > \\widehat{\\pi_0} / \\widehat{\\pi_1}$$ \n\nPlug in the density estimates:\n\n$$\\widehat{p_k}(x) = N(x - \\widehat{\\mu}_k,\\ \\widehat\\Sigma)$$\n\n\n## LDA\n\n\nNow we take $\\log$ and simplify $(K=2)$:\n\n$$\n\\begin{aligned}\n&\\Rightarrow \\log(\\widehat{p_1}(x)\\times\\widehat{\\pi_1}) - \\log(\\widehat{p_0}(x)\\times\\widehat{\\pi_0})\n= \\cdots = \\cdots\\\\\n&= \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1\\right)}_{\\delta_1(x)} - \\underbrace{\\left(x^\\top\\widehat\\Sigma^{-1}\\overline X_0-\\frac{1}{2}\\overline X_0^\\top \\widehat\\Sigma^{-1}\\overline X_0 + \\log \\widehat\\pi_0\\right)}_{\\delta_0(x)}\\\\\n&= \\delta_1(x) - \\delta_0(x)\n\\end{aligned}\n$$\n\n\n[If $\\delta_1(x) > \\delta_0(x)$, we set $\\widehat g(x)=1$]{.secondary}\n\n## One dimensional intuition\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(406406406)\nn <- 100\npi <- .6\nmu0 <- -1\nmu1 <- 2\nsigma <- 2\ntib <- tibble(\n y = rbinom(n, 1, pi),\n x = rnorm(n, mu0, sigma) * (y == 0) + rnorm(n, mu1, sigma) * (y == 1)\n)\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ngg <- ggplot(tib, aes(x, y)) +\n geom_point(colour = blue) +\n stat_function(fun = ~ 6 * (1 - pi) * dnorm(.x, mu0, sigma), colour = orange) +\n stat_function(fun = ~ 6 * pi * dnorm(.x, mu1, sigma), colour = orange) +\n annotate(\"label\",\n x = c(-3, 4.5), y = c(.5, 2 / 3),\n label = c(\"(1-pi)*p[0](x)\", \"pi*p[1](x)\"), parse = TRUE\n )\ngg\n```\n\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## What is linear?\n\nLook closely at the equation for $\\delta_1(x)$:\n\n$$\\delta_1(x)=x^\\top\\widehat\\Sigma^{-1}\\overline X_1-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$$\n\nWe can write this as $\\delta_1(x) = x^\\top a_1 + b_1$ with $a_1 = \\widehat\\Sigma^{-1}\\overline X_1$ and $b_1=-\\frac{1}{2}\\overline X_1^\\top \\widehat\\Sigma^{-1}\\overline X_1 + \\log \\widehat\\pi_1$.\n\nWe can do the same for $\\delta_0(x)$ (in terms of $a_0$ and $b_0$)\n\nTherefore, \n\n$$\\delta_1(x)-\\delta_0(x) = x^\\top(a_1-a_0) + (b_1-b_0)$$\n\nThis is how we discriminate between the classes.\n\nWe just calculate $(a_1 - a_0)$ (a vector in $\\R^p$), and $b_1 - b_0$ (a scalar)\n\n\n## Baby example\n\n::: flex\n::: w-50\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(mvtnorm)\nlibrary(MASS)\ngenerate_lda_2d <- function(\n n, p = c(.5, .5), \n mu = matrix(c(0, 0, 1, 1), 2),\n Sigma = diag(2)) {\n X <- rmvnorm(n, sigma = Sigma)\n tibble(\n y = which(rmultinom(n, 1, p) == 1, TRUE)[,1],\n x1 = X[, 1] + mu[1, y],\n x2 = X[, 2] + mu[2, y]\n )\n}\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlda_fit <- lda(y ~ ., dat1)\n```\n:::\n\n\n\n\n:::\n::: w-50\n\n\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/plot-d1-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n:::\n\n:::\n\n\n## Multiple classes\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmoreclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(0, 0, 1, 1, 1, 0), 2), .5 * diag(2))\nseparateclasses <- generate_lda_2d(150, c(.2, .3, .5), matrix(c(-1, -1, 2, 2, 2, -1), 2), .1 * diag(2))\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-plot-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## QDA\n\nJust like LDA, but $\\Sigma_k$ is separate for each class.\n\nProduces [Quadratic]{.secondary} decision boundary.\n\nEverything else is the same.\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nqda_fit <- qda(y ~ ., dat1)\nqda_3fit <- qda(y ~ ., moreclasses)\n```\n:::\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/qda-vs-lda-2class-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n## 3 class comparison\n\n\n\n\n::: {.cell layout-align=\"center\" dvi='300'}\n::: {.cell-output-display}\n![](15-LDA-and-QDA_files/figure-revealjs/3class-comparison-1.png){fig-align='center'}\n:::\n:::\n\n\n\n\n\n## Notes\n\n* LDA is a linear classifier. It is not a linear smoother.\n - It is derived from Bayes rule.\n - Assume each class-conditional density in Gaussian\n - It assumes the classes have different mean vectors, but the same (common) covariance matrix.\n - It estimates densities and probabilities and \"plugs in\" \n\n* QDA is not a linear classifier. It depends on quadratic functions of the data.\n - It is derived from Bayes rule.\n - Assume each class-conditional density in Gaussian\n - It assumes the classes have different mean vectors and different covariance matrices.\n - It estimates densities and probabilities and \"plugs in\" \n \n##\n\n[It is hard (maybe impossible) to come up with reasonable classifiers that are linear smoothers. Many \"look\" like a linear smoother, but then apply a nonlinear transformation.]{.hand}\n\n## Naïve Bayes\n\nAssume that $\\Pr(X=x | Y = k) = \\Pr(X_1=x_1 | Y = k)\\cdots \\Pr(X_p=x_p | Y = k)$.\n\nThat is, conditional on the class, the feature distribution is independent.\n\n. . .\n\nIf we further assume that $\\Pr(X_j=x_j | Y = k)$ is Gaussian,\n\nThis is the same as QDA but with $\\Sigma_k$ Diagonal.\n\n. . .\n\nDon't have to assume Gaussian. Could do lots of stuff. \n\n\n# Next time...\n\nAnother linear classifier and transformations\n", "supporting": [ "15-LDA-and-QDA_files" ], diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png index 8faa795..6cc5357 100644 Binary files a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png and b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-comparison-1.png differ diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png index 600c9b3..4e32616 100644 Binary files a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png and b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/3class-plot-1.png differ diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png index c0f11e5..fdb9c3a 100644 Binary files a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png and b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/plot-d1-1.png differ diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png index ef5f3b8..e9a58ed 100644 Binary files a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png and b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/qda-vs-lda-2class-1.png differ diff --git a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/unnamed-chunk-2-1.svg b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/unnamed-chunk-2-1.svg index 2082e33..3a9684a 100644 --- a/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/unnamed-chunk-2-1.svg +++ b/_freeze/schedule/slides/15-LDA-and-QDA/figure-revealjs/unnamed-chunk-2-1.svg @@ -1,418 +1,417 @@ - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + - - + + - - diff --git a/schedule/slides/15-LDA-and-QDA.qmd b/schedule/slides/15-LDA-and-QDA.qmd index 6ba57cd..79feba6 100644 --- a/schedule/slides/15-LDA-and-QDA.qmd +++ b/schedule/slides/15-LDA-and-QDA.qmd @@ -13,36 +13,32 @@ metadata-files: We showed that with two classes, the [Bayes' classifier]{.secondary} is -$$g_*(X) = \begin{cases} -1 & \textrm{ if } \frac{p_1(X)}{p_0(X)} > \frac{1-\pi}{\pi} \\ +$$g_*(x) = \begin{cases} +1 & \textrm{ if } \frac{p_1(x)}{p_0(x)} > \frac{1-\pi}{\pi} \\ 0 & \textrm{ otherwise} \end{cases}$$ -where $p_1(X) = Pr(X \given Y=1)$, $p_0(X) = Pr(X \given Y=0)$ and $\pi = Pr(Y=1)$ +where $p_1(x) = \Pr(X=x \given Y=1)$, $p_0(x) = \Pr(X=x \given Y=0)$ and $\pi = \Pr(Y=1)$ . . . -For more than two classes. +For more than two classes: -$$g_*(X) = -\argmax_k \frac{\pi_k p_k(X)}{\sum_k \pi_k p_k(X)}$$ +$$g_*(x) = +\argmax_k \frac{\pi_k p_k(x)}{\sum_k \pi_k p_k(x)}$$ -where $p_k(X) = Pr(X \given Y=k)$ and $\pi_k = P(Y=k)$ +where $p_k(x) = \Pr(X=x \given Y=k)$ and $\pi_k = P(Y=k)$ ## Estimating these Let's make some assumptions: -1. $Pr(X\given Y=k) = \mbox{N}(\mu_k,\Sigma_k)$ +1. $\Pr(X=x\given Y=k) = \mbox{N}(x; \mu_k,\Sigma_k)$ 2. $\Sigma_k = \Sigma_{k'} = \Sigma$ -. . . - This leads to [Linear Discriminant Analysis]{.secondary} (LDA), one of the oldest classifiers - - ## LDA @@ -279,13 +275,13 @@ plot_grid(g1, gq1) ## Naïve Bayes -Assume that $Pr(X | Y = k) = Pr(X_1 | Y = k)\cdots Pr(X_p | Y = k)$. +Assume that $\Pr(X=x | Y = k) = \Pr(X_1=x_1 | Y = k)\cdots \Pr(X_p=x_p | Y = k)$. That is, conditional on the class, the feature distribution is independent. . . . -If we further assume that $Pr(X_j | Y = k)$ is Gaussian, +If we further assume that $\Pr(X_j=x_j | Y = k)$ is Gaussian, This is the same as QDA but with $\Sigma_k$ Diagonal.