classification losses

UBC-STAT · Oct 16, 2023 · 1628837 · 1628837
1 parent b74d83d
commit 1628837
Show file tree

Hide file tree

Showing 6 changed files with 4,020 additions and 458 deletions.
diff --git a/_freeze/schedule/slides/00-classification-losses/execute-results/html.json b/_freeze/schedule/slides/00-classification-losses/execute-results/html.json
@@ -0,0 +1,20 @@
+{
+  "hash": "8b2edae6a14401844329d3d63dfd8c75",
+  "result": {
+    "markdown": "---\nlecture: \"00 Evaluating classifiers\"\nformat: revealjs\nmetadata-files: \n  - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 16 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## How do we measure accuracy?\n\n[So far]{.secondary} --- 0-1 loss. If correct class, lose 0 else lose 1.\n\n[Asymmetric classification loss]{.secondary} --- If correct class, lose 0 else lose something.\n\nFor example, consider facial recognition. Goal is \"person OK\", \"person has expired passport\", \"person is a known terrorist\"\n\n1. If classify OK, but was terrorist, lose 1,000,000\n1. If classify OK, but expired passport, lose 2\n1. If classify terrorist, but was OK, lose 100\n1. If classify terrorist, but was expired passport, lose 10\n1. etc.\n\n. . .\n\n\nResults in a 3x3 matrix of losses with 0 on the diagonal.\n\n\n::: {.cell layout-align=\"center\" R.options='{\"scipen\":8}'}\n::: {.cell-output .cell-output-stdout}\n```\n        [,1]  [,2] [,3]\n[1,]       0     2   30\n[2,]      10     0  100\n[3,] 1000000 50000    0\n```\n:::\n:::\n\n\n\n## Deviance loss\n\nSometimes we output [probabilities]{.secondary} as well as class labels.\n\nFor example, logistic regression returns the probability that an observation is in class 1. $P(Y_i = 1 \\given x_i) = 1 / (1 + \\exp\\{-x'_i \\hat\\beta\\})$\n\nLDA and QDA produce probabilities as well. So do Neural Networks (typically)\n\n(Trees \"don't\", neither does KNN, though you could fake it)\n\n. . .\n\n<hr>\n\n* Deviance loss for 2-class classification is $-2\\textrm{loglikelihood}(y, \\hat{p}) = -2 (y_i x'_i\\hat{\\beta} - \\log (1-\\hat{p}))$\n\n(Technically, it's the difference between this and the loss of the null model, but people play fast and loose)\n\n* Could also use cross entropy or Gini index.\n\n\n\n## Calibration\n\nSuppose we predict some probabilities for our data, how often do those events happen?\n\nIn principle, if we predict $\\hat{p}(x_i)=0.2$ for a bunch of events observations $i$, we'd like to see about 20% 1 and 80% 0. (In training set and test set)\n\nThe same goes for the other probabilities. If we say \"20% chance of rain\" it should rain 20% of such days.\n\n\nOf course, we didn't predict **exactly** $\\hat{p}(x_i)=0.2$ ever, so lets look at $[.15, .25]$.\n\n\n::: {.cell layout-align=\"center\" output-location='fragment'}\n\n```{.r .cell-code  code-line-numbers=\"1-6|7|8-9\"}\nn <- 250\ndat <- tibble(\n  x = seq(-5, 5, length.out = n), \n  p = 1 / (1 + exp(-x)),\n  y = rbinom(n, 1, p)\n)\nfit <- glm(y ~ x, family = binomial, data = dat)\ndat$phat <- predict(fit, type = \"response\") # predicted probabilities\ndat |> filter(phat > .15, phat < .25) |> summarize(target = .2, obs = mean(y))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 1 × 2\n  target   obs\n   <dbl> <dbl>\n1    0.2 0.222\n```\n:::\n:::\n\n\n\n## Calibration plot\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbinary_calibration_plot <- function(y, phat, nbreaks = 10) {\n  dat <- tibble(y = y, phat = phat) |>\n    mutate(bins = cut_number(phat, n = nbreaks))\n  midpts <- quantile(dat$phat, seq(0, 1, length.out = nbreaks + 1), na.rm = TRUE)\n  midpts <- midpts[-length(midpts)] + diff(midpts) / 2\n  sum_dat <- dat |> \n    group_by(bins) |>\n    summarise(p = mean(y, na.rm = TRUE), \n              se = sqrt(p * (1 - p) / n())) |>\n    arrange(p)\n  sum_dat$x <- midpts\n  \n  ggplot(sum_dat, aes(x = x)) + \n    geom_errorbar(aes(ymin = pmax(p - 1.96*se, 0), ymax = pmin(p + 1.96*se, 1))) +\n    geom_point(aes(y = p), color = blue) + \n    geom_abline(slope = 1, intercept = 0, color = orange) +\n    ylab(\"observed frequency\") + xlab(\"average predicted probability\") +\n    coord_cartesian(xlim = c(0, 1), ylim = c(0, 1)) +\n    geom_rug(data = dat, aes(x = phat), sides = 'b')\n}\n```\n:::\n\n\n\n## Amazingly well-calibrated\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbinary_calibration_plot(dat$y, dat$phat, 20L)\n```\n\n::: {.cell-output-display}\n![](00-classification-losses_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Less well-calibrated\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-classification-losses_files/figure-revealjs/unnamed-chunk-5-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## True positive, false negative, sensitivity, specificity\n\nTrue positive rate\n: \\# correct predict positive  / \\# actual positive (1 - FNR)\n\nFalse negative rate\n: \\# incorrect predict negative  / \\# actual positive (1 - TPR), Type II Error\n\nTrue negative rate\n: \\# correct predict negative  / \\# actual negative\n\nFalse positive rate\n: \\# incorrect predict positive  / \\# actual negative (1 - TNR), Type I Error\n\nSensitivity\n: TPR, 1 - Type II error\n\nSpecificity\n: TNR, 1 - Type I error\n\n\n\n## ROC and thresholds\n\nROC (Receiver Operating Characteristic) Curve\n: TPR (sensitivity) vs. FPR (1 - specificity)\n \nAUC (Area under the curve)\n: Integral of ROC. Closer to 1 is better.\n \nSo far, we've been thresholding at 0.5, though you shouldn't always do that. \n \nWith unbalanced data (say 10% 0 and 90% 1), if you care equally about predicting both classes, you might want to choose a different cutoff (like in LDA).\n \nTo make the [ROC]{.secondary} we look at our errors [as we vary the cutoff]{.secondary}\n \n\n## ROC curve\n\n\n\n::: {.cell layout-align=\"center\" output-location='column-fragment'}\n\n```{.r .cell-code}\nroc <- function(prediction, y) {\n  op <- order(prediction, decreasing = TRUE)\n  preds <- prediction[op]\n  y <- y[op]\n  noty <- 1 - y\n  if (any(duplicated(preds))) {\n    y <- rev(tapply(y, preds, sum))\n    noty <- rev(tapply(noty, preds, sum))\n  }\n  tibble(\n    FPR = cumsum(noty) / sum(noty), \n    TPR = cumsum(y) / sum(y)\n  )\n}\n\nggplot(roc(dat$phat, dat$y), aes(FPR, TPR)) +\n  geom_step(color = blue, size = 2) +\n  geom_abline(slope = 1, intercept = 0)\n```\n\n::: {.cell-output-display}\n![](00-classification-losses_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Other stuff\n\n![](gfx/huge-roc.png)\n\n* Source: worth exploring [Wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)\n",
+    "supporting": [
+      "00-classification-losses_files"
+    ],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {
+      "include-after-body": [
+        "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
+      ]
+    },
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}