diff --git a/_freeze/schedule/slides/20-boosting/execute-results/html.json b/_freeze/schedule/slides/20-boosting/execute-results/html.json index a710d7f..56f21bf 100644 --- a/_freeze/schedule/slides/20-boosting/execute-results/html.json +++ b/_freeze/schedule/slides/20-boosting/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "4e41ba0061438c4140349f5002e37fd6", + "hash": "ec86535b178e899a578c3a3c7779af0c", "result": { - "markdown": "---\nlecture: \"20 Boosting\"\nformat: \n revealjs:\n multiplex: true\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 02 November 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## Last time\n\n\n\nWe learned about bagging, for averaging [low-bias]{.secondary} / [high-variance]{.tertiary} estimators.\n\nToday, we examine it's opposite: Boosting.\n\nBoosting also combines estimators, but it combines [high-bias]{.secondary} / [low-variance]{.tertiary} estimators.\n\nBoosting has a number of flavours. And if you Google descriptions, most are wrong.\n\nFor a deep (and accurate) treatment, see [ESL] Chapter 10\n\n\n. . .\n\nWe'll discuss 2 flavours: [AdaBoost]{.secondary} and [Gradient Boosting]{.secondary}\n\nNeither requires a tree, but that's the typical usage.\n\nBoosting needs a \"weak learner\", so small trees (stumps) are natural.\n\n\n\n## AdaBoost intuition (for classification)\n\nAt each iteration, we weight the [observations]{.secondary}.\n\nObservations that are currently misclassified, get [higher]{.tertiary} weights.\n\nSo on the next iteration, we'll try harder to correctly classify our mistakes.\n\nThe number of iterations must be chosen.\n\n\n\n## AdaBoost (Freund and Schapire, generic)\n\nLet $G(x, \\theta)$ be any weak learner \n\n⛭ imagine a tree with one split: then $\\theta=$ (feature, split point)\n\n\n\nAlgorithm (AdaBoost) 🛠️\n\n* Set observation weights $w_i=1/n$.\n* Until we quit ( $m\n mutate(mobile = as.factor(Mobility > .1)) |>\n select(-ID, -Name, -Mobility, -State) |>\n drop_na()\nn <- nrow(mob)\ntrainidx <- sample.int(n, floor(n * .75))\ntestidx <- setdiff(1:n, trainidx)\ntrain <- mob[trainidx, ]\ntest <- mob[testidx, ]\nrf <- randomForest(mobile ~ ., data = train)\nbag <- randomForest(mobile ~ ., data = train, mtry = ncol(mob) - 1)\npreds <- tibble(truth = test$mobile, rf = predict(rf, test), bag = predict(bag, test))\n```\n:::\n\n::: {.cell layout-align=\"center\" output-location='column-fragment'}\n\n```{.r .cell-code code-line-numbers=\"1-6|7-12|17|\"}\nlibrary(gbm)\ntrain_boost <- train |>\n mutate(mobile = as.integer(mobile) - 1)\n# needs {0, 1} responses\ntest_boost <- test |>\n mutate(mobile = as.integer(mobile) - 1)\nadab <- gbm(\n mobile ~ .,\n data = train_boost,\n n.trees = 500,\n distribution = \"adaboost\"\n)\npreds$adab <- as.numeric(\n predict(adab, test_boost) > 0\n)\npar(mar = c(5, 11, 0, 1))\ns <- summary(adab, las = 1)\n```\n\n::: {.cell-output-display}\n![](20-boosting_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Forward stagewise additive modeling (FSAM, completely generic)\n\nAlgorithm 🛠️\n\n* Set initial predictor $f_0(x)=0$\n* Until we quit ( $m 0) != truth)), 2)\n )\n ) +\n annotate(\"text\",\n x = 4, y = -5, color = red,\n label = paste(\"adaboost error\\n\", round(with(boost_preds, mean((adaboost > 0) != truth)), 2))\n )\nboost_oob <- tibble(\n adaboost = adab$oobag.improve, gbm = grad_boost$oobag.improve,\n ntrees = 1:500\n)\ng2 <- boost_oob %>%\n pivot_longer(-ntrees, values_to = \"OOB_Error\") %>%\n ggplot(aes(x = ntrees, y = OOB_Error, color = name)) +\n geom_line() +\n scale_color_manual(values = c(orange, blue)) +\n theme(legend.title = element_blank())\nplot_grid(g1, g2, rel_widths = c(.4, .6))\n```\n\n::: {.cell-output-display}\n![](20-boosting_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Major takeaways\n\n* Two flavours of Boosting \n 1. AdaBoost (the original) and \n 2. gradient boosting (easier and more computationally friendly)\n\n* The connection is \"Forward stagewise additive modelling\" (AdaBoost is a special case)\n\n* The connection reveals that AdaBoost \"isn't robust because it uses exponential loss\" (squared error is even worse)\n\n* Gradient boosting is a computationally easier version of FSAM\n\n* All use **weak learners** (compare to Bagging)\n\n* Think about the Bias-Variance implications\n\n* You can use these for regression or classification\n\n* You can do this with other weak learners besides trees.\n\n\n\n# Next time...\n\nNeural networks and deep learning, the beginning\n", + "markdown": "---\nlecture: \"20 Boosting\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 02 November 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## Last time\n\n\n\nWe learned about bagging, for averaging [low-bias]{.secondary} / [high-variance]{.tertiary} estimators.\n\nToday, we examine it's opposite: Boosting.\n\nBoosting also combines estimators, but it combines [high-bias]{.secondary} / [low-variance]{.tertiary} estimators.\n\nBoosting has a number of flavours. And if you Google descriptions, most are wrong.\n\nFor a deep (and accurate) treatment, see [ESL] Chapter 10\n\n\n. . .\n\nWe'll discuss 2 flavours: [AdaBoost]{.secondary} and [Gradient Boosting]{.secondary}\n\nNeither requires a tree, but that's the typical usage.\n\nBoosting needs a \"weak learner\", so small trees (stumps) are natural.\n\n\n\n## AdaBoost intuition (for classification)\n\nAt each iteration, we weight the [observations]{.secondary}.\n\nObservations that are currently misclassified, get [higher]{.tertiary} weights.\n\nSo on the next iteration, we'll try harder to correctly classify our mistakes.\n\nThe number of iterations must be chosen.\n\n\n\n## AdaBoost (Freund and Schapire, generic)\n\nLet $G(x, \\theta)$ be any weak learner \n\n⛭ imagine a tree with one split: then $\\theta=$ (feature, split point)\n\n\n\nAlgorithm (AdaBoost) 🛠️\n\n* Set observation weights $w_i=1/n$.\n* Until we quit ( $m\n mutate(mobile = as.factor(Mobility > .1)) |>\n select(-ID, -Name, -Mobility, -State) |>\n drop_na()\nn <- nrow(mob)\ntrainidx <- sample.int(n, floor(n * .75))\ntestidx <- setdiff(1:n, trainidx)\ntrain <- mob[trainidx, ]\ntest <- mob[testidx, ]\nrf <- randomForest(mobile ~ ., data = train)\nbag <- randomForest(mobile ~ ., data = train, mtry = ncol(mob) - 1)\npreds <- tibble(truth = test$mobile, rf = predict(rf, test), bag = predict(bag, test))\n```\n:::\n\n::: {.cell layout-align=\"center\" output-location='column-fragment'}\n\n```{.r .cell-code code-line-numbers=\"1-6|7-12|17|\"}\nlibrary(gbm)\ntrain_boost <- train |>\n mutate(mobile = as.integer(mobile) - 1)\n# needs {0, 1} responses\ntest_boost <- test |>\n mutate(mobile = as.integer(mobile) - 1)\nadab <- gbm(\n mobile ~ .,\n data = train_boost,\n n.trees = 500,\n distribution = \"adaboost\"\n)\npreds$adab <- as.numeric(\n predict(adab, test_boost) > 0\n)\npar(mar = c(5, 11, 0, 1))\ns <- summary(adab, las = 1)\n```\n\n::: {.cell-output-display}\n![](20-boosting_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Forward stagewise additive modeling (FSAM, completely generic)\n\nAlgorithm 🛠️\n\n* Set initial predictor $f_0(x)=0$\n* Until we quit ( $m 0) != truth)), 2)\n )\n ) +\n annotate(\"text\",\n x = 4, y = -5, color = red,\n label = paste(\"adaboost error\\n\", round(with(boost_preds, mean((adaboost > 0) != truth)), 2))\n )\nboost_oob <- tibble(\n adaboost = adab$oobag.improve, gbm = grad_boost$oobag.improve,\n ntrees = 1:500\n)\ng2 <- boost_oob %>%\n pivot_longer(-ntrees, values_to = \"OOB_Error\") %>%\n ggplot(aes(x = ntrees, y = OOB_Error, color = name)) +\n geom_line() +\n scale_color_manual(values = c(orange, blue)) +\n theme(legend.title = element_blank())\nplot_grid(g1, g2, rel_widths = c(.4, .6))\n```\n\n::: {.cell-output-display}\n![](20-boosting_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Major takeaways\n\n* Two flavours of Boosting \n 1. AdaBoost (the original) and \n 2. gradient boosting (easier and more computationally friendly)\n\n* The connection is \"Forward stagewise additive modelling\" (AdaBoost is a special case)\n\n* The connection reveals that AdaBoost \"isn't robust because it uses exponential loss\" (squared error is even worse)\n\n* Gradient boosting is a computationally easier version of FSAM\n\n* All use **weak learners** (compare to Bagging)\n\n* Think about the Bias-Variance implications\n\n* You can use these for regression or classification\n\n* You can do this with other weak learners besides trees.\n\n\n\n# Next time...\n\nNeural networks and deep learning, the beginning\n", "supporting": [ "20-boosting_files" ],