From c54b9e42b30a7aa201d2c45fdb33f873b52077a4 Mon Sep 17 00:00:00 2001 From: Trevor Campbell Date: Mon, 14 Oct 2024 14:40:19 -0700 Subject: [PATCH] edits to classification intro --- .../execute-results/html.json | 5 +- .../figure-revealjs/unnamed-chunk-1-1.svg | 688 +++++----- .../figure-revealjs/unnamed-chunk-2-1.svg | 698 +++++----- .../figure-revealjs/unnamed-chunk-3-1.svg | 1150 +++++++++-------- .../site_libs/revealjs/dist/theme/quarto.css | 2 +- schedule/slides/14-classification-intro.qmd | 181 +-- 6 files changed, 1366 insertions(+), 1358 deletions(-) diff --git a/_freeze/schedule/slides/14-classification-intro/execute-results/html.json b/_freeze/schedule/slides/14-classification-intro/execute-results/html.json index c3e6b01..bfad2d5 100644 --- a/_freeze/schedule/slides/14-classification-intro/execute-results/html.json +++ b/_freeze/schedule/slides/14-classification-intro/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "8828d518ad4c4391935b78421ba9265f", + "hash": "9bcb73fe51987917f8d5a597637c43d5", "result": { - "markdown": "---\nlecture: \"14 Classification\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 09 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## An Overview of Classification\n\n\n\n* A person arrives at an emergency room with a set of symptoms that\ncould be 1 of 3 possible conditions. Which one is it?\n\n* An online banking service must be able to determine whether each\ntransaction is fraudulent or not, using a customer's location, past\ntransaction history, etc.\n\n* Given a set of individuals sequenced DNA, can we determine whether\nvarious mutations are associated with different phenotypes?\n\n. . .\n\nThese problems are [not]{.secondary} regression\nproblems. They are [classification]{.secondary} problems.\n\n\n## The Set-up\n\nIt begins just like regression: suppose we have observations\n$$\\{(x_1,y_1),\\ldots,(x_n,y_n)\\}$$\n\nAgain, we want to estimate a function that maps $X$ to $Y$ to\npredict as yet observed data.\n\n(This function is known as a [classifier]{.secondary})\n\n\nThe same constraints apply:\n\n* We want a classifier that predicts test data, not just the training\ndata.\n\n* Often, this comes with the introduction of some bias to get lower\nvariance and better predictions.\n\n\n## How do we measure quality?\n\nBefore in regression, we have $y_i \\in \\mathbb{R}$ and use squared error loss to measure accuracy: $(y - \\hat{y})^2$.\n\nInstead, let $y \\in \\mathcal{K} = \\{1,\\ldots, K\\}$\n\n(This is arbitrary, sometimes other numbers, such as $\\{-1,1\\}$ will be\nused)\n\nWe can always take \"factors\": $\\{\\textrm{cat},\\textrm{dog}\\}$ and convert to integers, which is what we assume.\n\n\nWe again make predictions $\\hat{y}=k$ based on the data\n\n\n* We get zero loss if we predict the right class\n* We lose $\\ell(k,k')$ on $(k\\neq k')$ for incorrect predictions\n\n\n## How do we measure quality?\n\nSuppose you have a fever of 39º C. You get a rapid test on campus.\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | Infect others |\n| Are - | Isolation | 0 |\n\n## How do we measure quality?\n\nSuppose you have a fever of 39º C. You get a rapid test on campus.\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | 1 |\n| Are - | 1 | 0 |\n\n\n## How do we measure quality?\n\n> We're going to use $g(x)$ to be our classifier. It takes values in $\\mathcal{K}$.\n\n\n## How do we measure quality?\n\nAgain, we appeal to risk\n$$R_n(g) = E [\\ell(Y,g(X))]$$ If we use the law of\ntotal probability, this can be written\n$$R_n(g) = E_X \\sum_{y=1}^K \\ell(y,\\; g(X)) Pr(Y = y \\given X)$$\nWe minimize this over a class of options $\\mathcal{G}$, to produce\n$$g_*(X) = \\argmin_{g\\in\\mathcal{G}} E_X \\sum_{y=1}^K \\ell(y,g(X)) Pr(Y = y \\given X)$$\n\n## How do we measure quality?\n\n$g_*$ is named the [Bayes' classifier]{.secondary} for loss $\\ell$ in class $\\mathcal{G}$. \n\n$R_n(g_*)$ is the called the [Bayes' limit]{.secondary} or [Bayes' Risk]{.secondary}. \n\n[It's the best we could hope to do in terms of]{.hand} $\\ell$ [if we knew the distribution of the data.]{.hand}\n\n. . .\n\nBut we don't, so we'll try to do our best to estimate $g_*$.\n\n\n## Best classifier overall\n\n(for now, we limit to 2 classes)\n\nOnce we make a specific choice for $\\ell$, we can find $g_*$ exactly (pretending we know the distribution)\n\n\nBecause $Y$ takes only a few values, [zero-one]{.secondary}\nloss is natural (but not the only option)\n$$\\ell(y,\\ g(x)) = \\begin{cases}0 & y=g(x)\\\\1 & y\\neq g(x) \\end{cases} \\Longrightarrow R_n(g) = \\Expect{\\ell(Y,\\ g(X))} = Pr(g(X) \\neq Y),$$\n\n## Best classifier overall\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | 1 |\n| Are - | 1 | 0 |\n\n## Best classifier overall\n\nThis means we want to \nclassify a new observation $(x_0,y_0)$ such that\n$g(x_0) = y_0$ as often as possible\n\n\nUnder this loss, we have\n$$\n\\begin{aligned}\ng_*(X) &= \\argmin_{g} Pr(g(X) \\neq Y) \\\\\n&= \\argmin_{g} \\left[ 1 - Pr(Y = g(x) | X=x)\\right] \\\\\n&= \\argmax_{g} Pr(Y = g(x) | X=x )\n\\end{aligned}\n$$\n\n\n## Estimating $g_*$\n\n\n\n### Classifier approach 1 (empirical risk minimization):\n\n1. Choose some class of classifiers $\\mathcal{G}$. \n\n2. Find $\\argmin_{g\\in\\mathcal{G}} \\sum_{i = 1}^n I(g(x_i) \\neq y_i)$\n\n\n## Bayes' Classifier and class densities (2 classes)\n\nUsing **Bayes' theorem**, and recalling that $f_*(X) = E[Y \\given X]$\n\n$$\\begin{aligned}\nf_*(X) & = E[Y \\given X] = Pr(Y = 1 \\given X) \\\\ \n&= \\frac{Pr(X\\given Y=1) Pr(Y=1)}{Pr(X)}\\\\\n& =\\frac{Pr(X\\given Y = 1) Pr(Y = 1)}{\\sum_{k \\in \\{0,1\\}} Pr(X\\given Y = k) Pr(Y = k)} \\\\ & = \\frac{p_1(X) \\pi}{ p_1(X)\\pi + p_0(X)(1-\\pi)}\\end{aligned}$$\n\n* We call $p_k(X)$ the [class (conditional) densities]{.secondary}\n\n* $\\pi$ is the [marginal probability]{.secondary} $P(Y=1)$\n\n## Bayes' Classifier and class densities (2 classes)\n\nThe Bayes' Classifier (best classifier for 0-1 loss) can be rewritten \n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\n\n### Approach 2: estimate everything in the expression above.\n\n* We need to estimate $p_1$, $p_2$, $\\pi$, $1-\\pi$\n* Easily extended to more than two classes\n\n\n## An alternative easy classifier\n\n\nZero-One loss was natural, but try something else\n\n\nLet's try using [squared error loss]{.secondary} instead:\n$\\ell(y,\\ f(x)) = (y - f(x))^2$\n\n\nThen, the Bayes' Classifier (the function that minimizes the Bayes Risk) is\n$$g_*(x) = f_*(x) = E[ Y \\given X = x] = Pr(Y = 1 \\given X)$$ \n(recall that $f_* \\in [0,1]$ is _still_ the regression function)\n\nIn this case, our \"class\" will actually just be a probability. But this isn't a class, so it's a bit unsatisfying.\n\nHow do we get a class prediction?\n\n. . .\n\nDiscretize the probability:\n\n$$g(x) = \\begin{cases}0 & f_*(x) < 1/2\\\\1 & \\textrm{else}\\end{cases}$$\n\n## Estimating $g_*$\n\n### Approach 3:\n\n1. Estimate $f_*$ using any method we've learned so far. \n2. Predict 0 if $\\hat{f}(x)$ is less than 1/2, else predict 1.\n\n\n\n\n## Claim: Classification is easier than regression\n\n\n1. Let $\\hat{f}$ be any estimate of $f_*$\n\n2. Let $\\widehat{g} (x) = \\begin{cases}0 & \\hat f(x) < 1/2\\\\1 & else\\end{cases}$\n\n[Proof by picture.]{.hand}\n\n## Claim: Classification is easier than regression\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\nset.seed(12345)\nx <- 1:99 / 100\ny <- rbinom(99, 1, \n .25 + .5 * (x > .3 & x < .5) + \n .6 * (x > .7))\ndmat <- as.matrix(dist(x))\nksm <- function(sigma) {\n gg <- dnorm(dmat, sd = sigma) \n sweep(gg, 1, rowSums(gg), '/') %*% y\n}\nfstar <- ksm(.04)\ngg <- tibble(x = x, fstar = fstar, y = y) %>%\n ggplot(aes(x)) +\n geom_point(aes(y = y), color = blue) +\n geom_line(aes(y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5)\ngg\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Claim: Classification is easier than regression\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ngg + geom_hline(yintercept = .5, color = green)\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Claim: Classification is easier than regression\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ntib <- tibble(x = x, fstar = fstar, y = y)\nggplot(tib) +\n geom_vline(data = filter(tib, fstar > 0.5), aes(xintercept = x), alpha = .5, color = green) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5) + \n geom_point(aes(x = x, y = y), color = blue) +\n geom_line(aes(x = x, y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1))\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## How to find a classifier\n\n[Why did we go through that math?]{.hand}\n\nEach of these approaches suggests a way to find a classifier\n\n* [Empirical risk minimization:]{.secondary} Choose a set\nof classifiers $\\mathcal{G}$ and find $g \\in \\mathcal{G}$ that minimizes\nsome estimate of $R_n(g)$\n \n> (This can be quite challenging as, unlike in regression, the\ntraining error is nonconvex)\n\n* [Density estimation:]{.secondary} Estimate $\\pi$ and $p_k$\n\n* [Regression:]{.secondary} Find an\nestimate $\\hat{f}$ of $f^*$ and compare the predicted value to 1/2\n\n\n\n\n##\n\nEasiest classifier when $y\\in \\{0,\\ 1\\}$:\n\n(stupidest version of the third case...)\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nghat <- round(predict(lm(y ~ ., data = trainingdata)))\n```\n:::\n\n\nThink about why this may not be very good. (At least 2 reasons I can think of.)\n\n\n# Next time:\n\nEstimating the densities\n", + "engine": "knitr", + "markdown": "---\nlecture: \"14 Classification\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n\n\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 14 October 2024\n\n\n\n\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## An Overview of Classification\n\n\n\n* A person arrives at an emergency room with a set of symptoms that\ncould be 1 of 3 possible conditions. Which one is it?\n\n* An online banking service must be able to determine whether each\ntransaction is fraudulent or not, using a customer's location, past\ntransaction history, etc.\n\n* Given a set of individuals sequenced DNA, can we determine whether\nvarious mutations are associated with different phenotypes?\n\n. . .\n\nThese problems are [not]{.secondary} regression\nproblems. They are [classification]{.secondary} problems.\n\n. . .\n\nClassification involves a **categorical response variable** (no notion of \"order\"/\"distance\").\n\n\n## Setup\n\nIt begins just like regression: suppose we have observations\n$$\\{(x_1,y_1),\\ldots,(x_n,y_n)\\}$$\n\nAgain, we want to estimate a function that maps $X$ to $Y$ to\npredict as yet observed data.\n\n(This function is known as a [classifier]{.secondary})\n\n\nThe same constraints apply:\n\n* We want a classifier that predicts test data, not just the training\ndata.\n\n* Often, this comes with the introduction of some bias to get lower\nvariance and better predictions.\n\n\n## How do we measure quality?\n\nBefore in regression, we have $y_i \\in \\mathbb{R}$ and use $(y - \\hat{y})^2$ loss to measure accuracy.\n\nInstead, let $y \\in \\mathcal{K} = \\{1,\\ldots, K\\}$\n\n(This is arbitrary, sometimes other numbers, such as $\\{-1,1\\}$ will be\nused)\n\nWe will usually convert categories/\"factors\" (e.g. $\\{\\textrm{cat},\\textrm{dog}\\}$) to integers.\n\n\nWe again make predictions $\\hat{y}=k$ based on the data\n\n\n* We get zero loss if we predict the right class\n* We lose $\\ell(k,k')$ on $(k\\neq k')$ for incorrect predictions\n\n## How do we measure quality?\n\nExample: You're trying to build a fun widget to classify images of cats and dogs.\n\n| Loss | Predict Dog | Predict Cat |\n|:---: | :---: | :---: |\n| Actual Dog | 0 | ? |\n| Actual Cat | ? | 0 |\n\n. . .\n\nUse the zero-one loss (1 if wrong, 0 if right). *Type of error doesn't matter.*\n\n| Loss | Predict Dog | Predict Cat |\n|:---: | :---: | :---: |\n| Actual Dog | 0 | 1 |\n| Actual Cat | 1 | 0 |\n\n## How do we measure quality?\n\nExample: Suppose you have a fever of 39º C. You get a rapid test on campus.\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | ? (Infect others) |\n| Are - | ? (Isolation) | 0 |\n\n. . .\n\nUse a weighted loss; *type of error matters!*\n\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | (LARGE) |\n| Are - | 1 | 0 |\n\n\nNote that one class is \"important\": we sometimes call that one *positive*. Errors are *false positive* and *false negative*.\n\nIn practice, you have to design your loss (just like before) to reflect what you care about.\n\n\n## How do we measure quality?\n\nWe're going to use $g(x)$ to be our classifier. It takes values in $\\mathcal{K}$.\n\nConsider the risk\n$$R_n(g) = E [\\ell(Y,g(X))]$$ If we use the law of\ntotal probability, this can be written\n$$R_n(g) = E\\left[\\sum_{y=1}^K \\ell(y,\\; g(X)) Pr(Y = y \\given X)\\right]$$\nWe minimize this over a class of options $\\mathcal{G}$, to produce\n$$g_*(X) = \\argmin_{g\\in\\mathcal{G}} E\\left[\\sum_{y=1}^K \\ell(y,g(X)) Pr(Y = y \\given X)\\right]$$\n\n## How do we measure quality?\n\n$g_*$ is named the [Bayes' classifier]{.secondary} for loss $\\ell$ in class $\\mathcal{G}$. \n\n$R_n(g_*)$ is the called the [Bayes' limit]{.secondary} or [Bayes' Risk]{.secondary}. \n\nIt's the best we could hope to do *even if we knew the distribution of the data* (recall irreducible error!)\n\nBut we don't, so we'll try to do our best to estimate $g_*$.\n\n\n## Best classifier overall\n\n\nSuppose we actually *know* the distribution of everything, and we've picked $\\ell$ to be the [zero-one loss]{.secondary}\n\n$$\\ell(y,\\ g(x)) = \\begin{cases}0 & y=g(x)\\\\1 & y\\neq g(x) \\end{cases}$$\n\n| Loss | Test + | Test - |\n|:---: | :---: | :---: |\n| Are + | 0 | 1 |\n| Are - | 1 | 0 |\n\nThen \n\n$$R_n(g) = \\Expect{\\ell(Y,\\ g(X))} = Pr(g(X) \\neq Y)$$\n\n## Best classifier overall\n\nWant to classify a new observation $(X,Y)$ such that\n$g(X) = Y$ with as high probability as possible. Under zero-one loss, we have\n\n$$g_* = \\argmin_{g} Pr(g(X) \\neq Y) = \\argmin_g 1- \\Pr(g(X) = Y) = \\argmax_g \\Pr(g(X) = Y)$$\n\n. . .\n\n$$\n\\begin{aligned}\ng_* &= \\argmax_{g} E[\\Pr(g(X) = Y | X)]\\\\\n &= \\argmax_{g} E\\left[\\sum_{k\\in\\mathcal{K}}1[g(X) = k]\\Pr(Y=k | X)\\right]\n\\end{aligned}\n$$\n\n. . .\n\nFor each $x$, only one $k$ can satisfy $g(x) = k$. So for each $x$,\n\n$$\ng_*(x) = \\argmax_{k\\in\\mathcal{K}} \\Pr(Y = k | X = x).\n$$\n\n## Estimating $g_*$ Approach 1: Empirical risk minimization\n\n1. Choose some class of classifiers $\\mathcal{G}$. \n\n2. Find $\\argmin_{g\\in\\mathcal{G}} \\sum_{i = 1}^n I(g(x_i) \\neq y_i)$\n\n\n## Estimating $g_*$ Approach 2: Class densities\n\nConsider 2 classes $\\{0,1\\}$: using **Bayes' theorem** (and being loose with notation),\n\n$$\\begin{aligned}\n\\Pr(Y=1 \\given X=x) &= \\frac{\\Pr(X=x\\given Y=1) \\Pr(Y=1)}{\\Pr(X=x)}\\\\\n&=\\frac{\\Pr(X=x\\given Y = 1) \\Pr(Y = 1)}{\\sum_{k \\in \\{0,1\\}} \\Pr(X=x\\given Y = k) \\Pr(Y = k)} \\\\ \n&= \\frac{p_1(x) \\pi}{ p_1(x)\\pi + p_0(x)(1-\\pi)}\\end{aligned}$$\n\n* We call $p_k(x)$ the [class (conditional) densities]{.secondary}\n\n* $\\pi$ is the [marginal probability]{.secondary} $P(Y=1)$\n\n* Similar formula for $\\Pr(Y=0\\given X=x) = p_0(x)(1-\\pi)/(\\dots)$\n\n## Estimating $g_*$ Approach 2: Class densities\n\nRecall $g_*(x) = \\argmax_k \\Pr(Y=k|x)$; so we classify 1 if\n\n$$\\frac{p_1(x) \\pi}{ p_1(x)\\pi + p_0(x)(1-\\pi)} > \\frac{p_0(x) (1-\\pi)}{ p_1(x)\\pi + p_0(x)(1-\\pi)}$$\n\ni.e., the [Bayes' Classifier]{.secondary} (best classifier for 0-1 loss) can be rewritten \n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\n\n### Estimate everything in the expression above.\n\n* We need to estimate $p_0$, $p_1$, $\\pi$, $1-\\pi$\n* Easily extended to more than two classes\n\n\n## Estimating $g_*$ Approach 3: Regression discretization\n\n\n0-1 loss natural, but discrete. Let's try using [squared error]{.secondary}: $\\ell(y,\\ f(x)) = (y - f(x))^2$\n\n**What will be the optimal classifier here?** (hint: think about regression)\n\n. . .\n\nThe \"Bayes' Classifier\" (sort of...minimizes risk) is just the regression function!\n$$f_*(x) = \\Pr(Y = 1 \\given X=x) = E[ Y \\given X = x] $$ \n\nIn this case, $0\\leq f_*(x)\\leq 1$ not discrete... How do we get a class prediction?\n\n. . .\n\n**Discretize the output**:\n\n$$g(x) = \\begin{cases}0 & f_*(x) < 1/2\\\\1 & \\textrm{else}\\end{cases}$$\n\n1. Estimate $\\hat f(x) = E[Y|X=x] = \\Pr(Y=1|X=x)$ using any method we've learned so far. \n2. Predict 0 if $\\hat{f}(x)$ is less than 1/2, else predict 1.\n\n## Claim: Classification is easier than regression\n\n\n1. Let $\\hat{f}$ be any estimate of $f_*$\n\n2. Let $\\widehat{g} (x) = \\begin{cases}0 & \\hat f(x) < 1/2\\\\1 & else\\end{cases}$\n\n[Proof by picture.]{.hand}\n\n## Claim: Classification is easier than regression\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\nset.seed(12345)\nx <- 1:99 / 100\ny <- rbinom(99, 1, \n .25 + .5 * (x > .3 & x < .5) + \n .6 * (x > .7))\ndmat <- as.matrix(dist(x))\nksm <- function(sigma) {\n gg <- dnorm(dmat, sd = sigma) \n sweep(gg, 1, rowSums(gg), '/') %*% y\n}\nfstar <- ksm(.04)\ngg <- tibble(x = x, fstar = fstar, y = y) %>%\n ggplot(aes(x)) +\n geom_point(aes(y = y), color = blue) +\n geom_line(aes(y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5)\ngg\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Claim: Classification is easier than regression\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ngg + geom_hline(yintercept = .5, color = green)\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Claim: Classification is easier than regression\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ntib <- tibble(x = x, fstar = fstar, y = y)\nggplot(tib) +\n geom_vline(data = filter(tib, fstar > 0.5), aes(xintercept = x), alpha = .5, color = green) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5) + \n geom_point(aes(x = x, y = y), color = blue) +\n geom_line(aes(x = x, y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1))\n```\n\n::: {.cell-output-display}\n![](14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## How to find a classifier\n\n**Why did we go through that math?**\n\nEach of these approaches has strengths/drawbacks:\n\n* [Empirical risk minimization:]{.secondary} Minimize $R_n(g)$ in some family $\\mathcal{G}$\n \n> (This can be quite challenging as, unlike in regression, the training error is nonconvex)\n\n* [Density estimation:]{.secondary} Estimate $\\pi$ and $p_k$\n\n> (We have to estimate class densities to classify. Too roundabout?)\n\n* [Regression:]{.secondary} Find an estimate $\\hat{f}\\approx E[Y|X=x]$ and compare the predicted value to 1/2\n\n> (Unnatural, estimates whole regression function when we'll just discretize anyway)\n\n# Next time...\nEstimating the densities\n", "supporting": [ "14-classification-intro_files" ], diff --git a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-1-1.svg b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-1-1.svg index b9af09d..8f20477 100644 --- a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-1-1.svg +++ b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-1-1.svg @@ -1,369 +1,371 @@ - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-2-1.svg b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-2-1.svg index 72a1074..77852f1 100644 --- a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-2-1.svg +++ b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-2-1.svg @@ -1,375 +1,377 @@ - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-3-1.svg b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-3-1.svg index 7f4e494..71c92ea 100644 --- a/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-3-1.svg +++ b/_freeze/schedule/slides/14-classification-intro/figure-revealjs/unnamed-chunk-3-1.svg @@ -1,699 +1,701 @@ - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - + + - - - - - + + + + + + + + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/_freeze/site_libs/revealjs/dist/theme/quarto.css b/_freeze/site_libs/revealjs/dist/theme/quarto.css index 2be515e..62c5b6b 100644 --- a/_freeze/site_libs/revealjs/dist/theme/quarto.css +++ b/_freeze/site_libs/revealjs/dist/theme/quarto.css @@ -5,4 +5,4 @@ * we also add `bright-[color]-` synonyms for the `-[color]-intense` classes since * that seems to be what ansi_up emits * -*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #fefefe;--quarto-body-color: #222;--quarto-text-muted: #6f6f6f;--quarto-border-color: #bbbbbb;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #fefefe;--mermaid-edge-color: #e98a15;--mermaid-node-fg-color: #222;--mermaid-fg-color: #222;--mermaid-fg-color--lighter: #3c3c3c;--mermaid-fg-color--lightest: #555555;--mermaid-font-family: Commissioner, Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #fefefe;--mermaid-label-fg-color: #2c365e;--mermaid-node-bg-color: rgba(44, 54, 94, 0.1);--mermaid-node-fg-color: #222}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #bbb;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#bbb;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#bbb}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #bbb;border-bottom-color:#fefefe}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #bbb}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}a.external:after{content:"";background-image:url('data:image/svg+xml,');background-size:contain;background-repeat:no-repeat;background-position:center center;margin-left:.2em;padding-right:.75em}div.sourceCode code a.external:after{content:none}a.external:after:hover{cursor:pointer}.quarto-ext-icon{display:inline-block;font-size:.75em;padding-left:.3em}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #bbb;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#fefefe;border:none;margin:0;font-size:.55em}.reveal .code-with-filename .code-with-filename-file pre{background-color:unset}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#fefefe;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#fefefe;color:#222}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#fefefe}.reveal blockquote{display:block;position:relative;color:#6f6f6f;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #6f6f6f;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#6f6f6f}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#bbb;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#eea143;background-color:#fefefe}.reveal .footer{color:#6f6f6f}.reveal .footer a{color:#e98a15}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#6f6f6f}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#fefefe;color:#222;border-radius:4px;border:solid 1px #6f6f6f;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#6f6f6f}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#fefefe;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #222 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}ol{padding-left:.5em}ul{list-style:none}ul li::marker{color:#2c365e}dt{color:#e98a15}#title-slide{text-align:left}#title-slide .title{color:#2c365e}#title-slide .author{text-align:left;color:#e98a15;font-weight:bold}#title-slide .institute{padding-bottom:200px}.small{font-size:.75em}.smallest{font-size:.5em}.large{font-size:1.5em}.medium{font-size:.9em}.center-align{text-align:center}.hand{font-family:"Gochi Hand",cursive;font-size:125%}.hand-blue{font-family:"Gochi Hand",cursive;color:var(--primary);font-size:125%}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */ +*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #fefefe;--quarto-body-color: #222;--quarto-text-muted: #6f6f6f;--quarto-border-color: #bbbbbb;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #fefefe;--mermaid-edge-color: #e98a15;--mermaid-node-fg-color: #222;--mermaid-fg-color: #222;--mermaid-fg-color--lighter: #3c3c3c;--mermaid-fg-color--lightest: #555555;--mermaid-font-family: Commissioner, Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #fefefe;--mermaid-label-fg-color: #2c365e;--mermaid-node-bg-color: rgba(44, 54, 94, 0.1);--mermaid-node-fg-color: #222}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #bbb;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#bbb;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#bbb}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #bbb;border-bottom-color:#fefefe}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #bbb}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}a.external:after{content:"";background-image:url('data:image/svg+xml,');background-size:contain;background-repeat:no-repeat;background-position:center center;margin-left:.2em;padding-right:.75em}div.sourceCode code a.external:after{content:none}a.external:after:hover{cursor:pointer}.quarto-ext-icon{display:inline-block;font-size:.75em;padding-left:.3em}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #bbb;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#fefefe;border:none;margin:0;font-size:.55em}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#fefefe;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#fefefe;color:#222}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#fefefe}.reveal blockquote{display:block;position:relative;color:#6f6f6f;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #6f6f6f;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#6f6f6f}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#bbb;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#eea143;background-color:#fefefe}.reveal .footer{color:#6f6f6f}.reveal .footer a{color:#e98a15}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#6f6f6f}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#fefefe;color:#222;border-radius:4px;border:solid 1px #6f6f6f;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#6f6f6f}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#fefefe;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #222 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}ol{padding-left:.5em}ul{list-style:none}ul li::marker{color:#2c365e}dt{color:#e98a15}#title-slide{text-align:left}#title-slide .title{color:#2c365e}#title-slide .author{text-align:left;color:#e98a15;font-weight:bold}#title-slide .institute{padding-bottom:200px}.small{font-size:.75em}.smallest{font-size:.5em}.large{font-size:1.5em}.medium{font-size:.9em}.center-align{text-align:center}.hand{font-family:"Gochi Hand",cursive;font-size:125%}.hand-blue{font-family:"Gochi Hand",cursive;color:var(--primary);font-size:125%}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */ diff --git a/schedule/slides/14-classification-intro.qmd b/schedule/slides/14-classification-intro.qmd index 2d90805..47ce0ae 100644 --- a/schedule/slides/14-classification-intro.qmd +++ b/schedule/slides/14-classification-intro.qmd @@ -27,8 +27,12 @@ various mutations are associated with different phenotypes? These problems are [not]{.secondary} regression problems. They are [classification]{.secondary} problems. +. . . + +Classification involves a **categorical response variable** (no notion of "order"/"distance"). -## The Set-up + +## Setup It begins just like regression: suppose we have observations $$\{(x_1,y_1),\ldots,(x_n,y_n)\}$$ @@ -50,14 +54,14 @@ variance and better predictions. ## How do we measure quality? -Before in regression, we have $y_i \in \mathbb{R}$ and use squared error loss to measure accuracy: $(y - \hat{y})^2$. +Before in regression, we have $y_i \in \mathbb{R}$ and use $(y - \hat{y})^2$ loss to measure accuracy. Instead, let $y \in \mathcal{K} = \{1,\ldots, K\}$ (This is arbitrary, sometimes other numbers, such as $\{-1,1\}$ will be used) -We can always take "factors": $\{\textrm{cat},\textrm{dog}\}$ and convert to integers, which is what we assume. +We will usually convert categories/"factors" (e.g. $\{\textrm{cat},\textrm{dog}\}$) to integers. We again make predictions $\hat{y}=k$ based on the data @@ -66,39 +70,59 @@ We again make predictions $\hat{y}=k$ based on the data * We get zero loss if we predict the right class * We lose $\ell(k,k')$ on $(k\neq k')$ for incorrect predictions +## How do we measure quality? + +Example: You're trying to build a fun widget to classify images of cats and dogs. + +| Loss | Predict Dog | Predict Cat | +|:---: | :---: | :---: | +| Actual Dog | 0 | ? | +| Actual Cat | ? | 0 | + +. . . + +Use the zero-one loss (1 if wrong, 0 if right). *Type of error doesn't matter.* + +| Loss | Predict Dog | Predict Cat | +|:---: | :---: | :---: | +| Actual Dog | 0 | 1 | +| Actual Cat | 1 | 0 | ## How do we measure quality? -Suppose you have a fever of 39º C. You get a rapid test on campus. +Example: Suppose you have a fever of 39º C. You get a rapid test on campus. | Loss | Test + | Test - | |:---: | :---: | :---: | -| Are + | 0 | Infect others | -| Are - | Isolation | 0 | +| Are + | 0 | ? (Infect others) | +| Are - | ? (Isolation) | 0 | -## How do we measure quality? +. . . + +Use a weighted loss; *type of error matters!* -Suppose you have a fever of 39º C. You get a rapid test on campus. | Loss | Test + | Test - | |:---: | :---: | :---: | -| Are + | 0 | 1 | +| Are + | 0 | (LARGE) | | Are - | 1 | 0 | -## How do we measure quality? +Note that one class is "important": we sometimes call that one *positive*. Errors are *false positive* and *false negative*. -> We're going to use $g(x)$ to be our classifier. It takes values in $\mathcal{K}$. +In practice, you have to design your loss (just like before) to reflect what you care about. ## How do we measure quality? -Again, we appeal to risk +We're going to use $g(x)$ to be our classifier. It takes values in $\mathcal{K}$. + +Consider the risk $$R_n(g) = E [\ell(Y,g(X))]$$ If we use the law of total probability, this can be written -$$R_n(g) = E_X \sum_{y=1}^K \ell(y,\; g(X)) Pr(Y = y \given X)$$ +$$R_n(g) = E\left[\sum_{y=1}^K \ell(y,\; g(X)) Pr(Y = y \given X)\right]$$ We minimize this over a class of options $\mathcal{G}$, to produce -$$g_*(X) = \argmin_{g\in\mathcal{G}} E_X \sum_{y=1}^K \ell(y,g(X)) Pr(Y = y \given X)$$ +$$g_*(X) = \argmin_{g\in\mathcal{G}} E\left[\sum_{y=1}^K \ell(y,g(X)) Pr(Y = y \given X)\right]$$ ## How do we measure quality? @@ -106,75 +130,80 @@ $g_*$ is named the [Bayes' classifier]{.secondary} for loss $\ell$ in class $\ma $R_n(g_*)$ is the called the [Bayes' limit]{.secondary} or [Bayes' Risk]{.secondary}. -[It's the best we could hope to do in terms of]{.hand} $\ell$ [if we knew the distribution of the data.]{.hand} - -. . . +It's the best we could hope to do *even if we knew the distribution of the data* (recall irreducible error!) But we don't, so we'll try to do our best to estimate $g_*$. ## Best classifier overall -(for now, we limit to 2 classes) -Once we make a specific choice for $\ell$, we can find $g_*$ exactly (pretending we know the distribution) +Suppose we actually *know* the distribution of everything, and we've picked $\ell$ to be the [zero-one loss]{.secondary} - -Because $Y$ takes only a few values, [zero-one]{.secondary} -loss is natural (but not the only option) -$$\ell(y,\ g(x)) = \begin{cases}0 & y=g(x)\\1 & y\neq g(x) \end{cases} \Longrightarrow R_n(g) = \Expect{\ell(Y,\ g(X))} = Pr(g(X) \neq Y),$$ - -## Best classifier overall +$$\ell(y,\ g(x)) = \begin{cases}0 & y=g(x)\\1 & y\neq g(x) \end{cases}$$ | Loss | Test + | Test - | |:---: | :---: | :---: | | Are + | 0 | 1 | | Are - | 1 | 0 | +Then + +$$R_n(g) = \Expect{\ell(Y,\ g(X))} = Pr(g(X) \neq Y)$$ + ## Best classifier overall -This means we want to -classify a new observation $(x_0,y_0)$ such that -$g(x_0) = y_0$ as often as possible +Want to classify a new observation $(X,Y)$ such that +$g(X) = Y$ with as high probability as possible. Under zero-one loss, we have +$$g_* = \argmin_{g} Pr(g(X) \neq Y) = \argmin_g 1- \Pr(g(X) = Y) = \argmax_g \Pr(g(X) = Y)$$ + +. . . -Under this loss, we have $$ \begin{aligned} -g_*(X) &= \argmin_{g} Pr(g(X) \neq Y) \\ -&= \argmin_{g} \left[ 1 - Pr(Y = g(x) | X=x)\right] \\ -&= \argmax_{g} Pr(Y = g(x) | X=x ) +g_* &= \argmax_{g} E[\Pr(g(X) = Y | X)]\\ + &= \argmax_{g} E\left[\sum_{k\in\mathcal{K}}1[g(X) = k]\Pr(Y=k | X)\right] \end{aligned} $$ +. . . -## Estimating $g_*$ - +For each $x$, only one $k$ can satisfy $g(x) = k$. So for each $x$, +$$ +g_*(x) = \argmax_{k\in\mathcal{K}} \Pr(Y = k | X = x). +$$ -### Classifier approach 1 (empirical risk minimization): +## Estimating $g_*$ Approach 1: Empirical risk minimization 1. Choose some class of classifiers $\mathcal{G}$. 2. Find $\argmin_{g\in\mathcal{G}} \sum_{i = 1}^n I(g(x_i) \neq y_i)$ -## Bayes' Classifier and class densities (2 classes) +## Estimating $g_*$ Approach 2: Class densities -Using **Bayes' theorem**, and recalling that $f_*(X) = E[Y \given X]$ +Consider 2 classes $\{0,1\}$: using **Bayes' theorem** (and being loose with notation), $$\begin{aligned} -f_*(X) & = E[Y \given X] = Pr(Y = 1 \given X) \\ -&= \frac{Pr(X\given Y=1) Pr(Y=1)}{Pr(X)}\\ -& =\frac{Pr(X\given Y = 1) Pr(Y = 1)}{\sum_{k \in \{0,1\}} Pr(X\given Y = k) Pr(Y = k)} \\ & = \frac{p_1(X) \pi}{ p_1(X)\pi + p_0(X)(1-\pi)}\end{aligned}$$ +\Pr(Y=1 \given X=x) &= \frac{\Pr(X=x\given Y=1) \Pr(Y=1)}{\Pr(X=x)}\\ +&=\frac{\Pr(X=x\given Y = 1) \Pr(Y = 1)}{\sum_{k \in \{0,1\}} \Pr(X=x\given Y = k) \Pr(Y = k)} \\ +&= \frac{p_1(x) \pi}{ p_1(x)\pi + p_0(x)(1-\pi)}\end{aligned}$$ -* We call $p_k(X)$ the [class (conditional) densities]{.secondary} +* We call $p_k(x)$ the [class (conditional) densities]{.secondary} * $\pi$ is the [marginal probability]{.secondary} $P(Y=1)$ -## Bayes' Classifier and class densities (2 classes) +* Similar formula for $\Pr(Y=0\given X=x) = p_0(x)(1-\pi)/(\dots)$ + +## Estimating $g_*$ Approach 2: Class densities + +Recall $g_*(x) = \argmax_k \Pr(Y=k|x)$; so we classify 1 if -The Bayes' Classifier (best classifier for 0-1 loss) can be rewritten +$$\frac{p_1(x) \pi}{ p_1(x)\pi + p_0(x)(1-\pi)} > \frac{p_0(x) (1-\pi)}{ p_1(x)\pi + p_0(x)(1-\pi)}$$ + +i.e., the [Bayes' Classifier]{.secondary} (best classifier for 0-1 loss) can be rewritten $$g_*(X) = \begin{cases} 1 & \textrm{ if } \frac{p_1(X)}{p_0(X)} > \frac{1-\pi}{\pi} \\ @@ -182,46 +211,35 @@ $$g_*(X) = \begin{cases} \end{cases}$$ -### Approach 2: estimate everything in the expression above. +### Estimate everything in the expression above. -* We need to estimate $p_1$, $p_2$, $\pi$, $1-\pi$ +* We need to estimate $p_0$, $p_1$, $\pi$, $1-\pi$ * Easily extended to more than two classes -## An alternative easy classifier - - -Zero-One loss was natural, but try something else +## Estimating $g_*$ Approach 3: Regression discretization -Let's try using [squared error loss]{.secondary} instead: -$\ell(y,\ f(x)) = (y - f(x))^2$ +0-1 loss natural, but discrete. Let's try using [squared error]{.secondary}: $\ell(y,\ f(x)) = (y - f(x))^2$ +**What will be the optimal classifier here?** (hint: think about regression) -Then, the Bayes' Classifier (the function that minimizes the Bayes Risk) is -$$g_*(x) = f_*(x) = E[ Y \given X = x] = Pr(Y = 1 \given X)$$ -(recall that $f_* \in [0,1]$ is _still_ the regression function) +. . . -In this case, our "class" will actually just be a probability. But this isn't a class, so it's a bit unsatisfying. +The "Bayes' Classifier" (sort of...minimizes risk) is just the regression function! +$$f_*(x) = \Pr(Y = 1 \given X=x) = E[ Y \given X = x] $$ -How do we get a class prediction? +In this case, $0\leq f_*(x)\leq 1$ not discrete... How do we get a class prediction? . . . -Discretize the probability: +**Discretize the output**: $$g(x) = \begin{cases}0 & f_*(x) < 1/2\\1 & \textrm{else}\end{cases}$$ -## Estimating $g_*$ - -### Approach 3: - -1. Estimate $f_*$ using any method we've learned so far. +1. Estimate $\hat f(x) = E[Y|X=x] = \Pr(Y=1|X=x)$ using any method we've learned so far. 2. Predict 0 if $\hat{f}(x)$ is less than 1/2, else predict 1. - - - ## Claim: Classification is easier than regression @@ -279,38 +297,21 @@ ggplot(tib) + ## How to find a classifier -[Why did we go through that math?]{.hand} +**Why did we go through that math?** -Each of these approaches suggests a way to find a classifier +Each of these approaches has strengths/drawbacks: -* [Empirical risk minimization:]{.secondary} Choose a set -of classifiers $\mathcal{G}$ and find $g \in \mathcal{G}$ that minimizes -some estimate of $R_n(g)$ +* [Empirical risk minimization:]{.secondary} Minimize $R_n(g)$ in some family $\mathcal{G}$ -> (This can be quite challenging as, unlike in regression, the -training error is nonconvex) +> (This can be quite challenging as, unlike in regression, the training error is nonconvex) * [Density estimation:]{.secondary} Estimate $\pi$ and $p_k$ -* [Regression:]{.secondary} Find an -estimate $\hat{f}$ of $f^*$ and compare the predicted value to 1/2 - - - - -## - -Easiest classifier when $y\in \{0,\ 1\}$: - -(stupidest version of the third case...) - -```{r eval=FALSE} -ghat <- round(predict(lm(y ~ ., data = trainingdata))) -``` - -Think about why this may not be very good. (At least 2 reasons I can think of.) +> (We have to estimate class densities to classify. Too roundabout?) +* [Regression:]{.secondary} Find an estimate $\hat{f}\approx E[Y|X=x]$ and compare the predicted value to 1/2 -# Next time: +> (Unnatural, estimates whole regression function when we'll just discretize anyway) +# Next time... Estimating the densities