Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/UBC-STAT/stat-406
Browse files Browse the repository at this point in the history
  • Loading branch information
dajmcdon committed Sep 26, 2023
2 parents 7d26b30 + c0b9987 commit 9bcc6be
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 356 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"hash": "ce81d559001112d6fd73786eb2f4d192",
"result": {
"markdown": "---\nlecture: \"00 CV for many models\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 19 September 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## Some data and 4 models\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndata(\"mobility\", package = \"Stat406\")\n```\n:::\n\n\n**Model 1:** Lasso on all predictors, use CV min\n\n**Model 2:** Ridge on all predictors, use CV min\n\n**Model 3:** OLS on all predictors (no tuning parameters)\n\n**Model 4:** (1) Lasso on all predictors, then (2) OLS on those chosen at CV min\n\n\n> How do I decide between these 4 models?\n\n\n## CV functions\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nkfold_cv <- function(data, estimator, predictor, error_fun, kfolds = 5) {\n fold_labels <- sample(rep(seq_len(kfolds), length.out = nrow(data)))\n errors <- double(kfolds)\n for (fold in seq_len(kfolds)) {\n test_rows <- fold_labels == fold\n train <- data[!test_rows, ]\n test <- data[test_rows, ]\n current_model <- estimator(train)\n test$.preds <- predictor(current_model, test)\n errors[fold] <- error_fun(test)\n }\n mean(errors)\n}\n\nloo_cv <- function(dat) {\n mdl <- lm(Mobility ~ ., data = dat)\n mean( abs(residuals(mdl)) / abs(1 - hatvalues(mdl)) ) # MAE version\n}\n```\n:::\n\n\n\n## Experiment setup\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# prepare our data\n# note that mob has only continuous predictors, otherwise could be trouble\nmob <- mobility[complete.cases(mobility), ] |> select(-ID, -State, -Name)\n# avoid doing this same operation a bunch\nxmat <- function(dat) dat |> select(!Mobility) |> as.matrix()\n\n# set up our model functions\nlibrary(glmnet)\nmod1 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, type.measure = \"mae\", ...)\nmod2 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, alpha = 0, type.measure = \"mae\", ...)\nmod3 <- function(dat, ...) glmnet(xmat(dat), dat$Mobility, lambda = 0, ...) # just does lm()\nmod4 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, relax = TRUE, gamma = 1, type.measure = \"mae\", ...)\n\n# this will still \"work\" on mod3, because there's only 1 s\npredictor <- function(mod, dat) drop(predict(mod, newx = xmat(dat), s = \"lambda.min\"))\n\n# chose mean absolute error just 'cause\nerror_fun <- function(testdata) mean(abs(testdata$Mobility - testdata$.preds))\n```\n:::\n\n\n\n## Run the experiment\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nall_model_funs <- lst(mod1, mod2, mod3, mod4)\nall_fits <- map(all_model_funs, .f = exec, dat = mob)\n\n# unfortunately, does different splits for each method, so we use 10, \n# it would be better to use the _SAME_ splits\nten_fold_cv <- map_dbl(all_model_funs, ~ kfold_cv(mob, .x, predictor, error_fun, 10)) \n\nin_sample_cv <- c(\n mod1 = min(all_fits[[1]]$cvm),\n mod2 = min(all_fits[[2]]$cvm),\n mod3 = loo_cv(mob),\n mod4 = min(all_fits[[4]]$cvm)\n)\n\ntib <- bind_rows(in_sample_cv, ten_fold_cv)\ntib$method = c(\"in_sample\", \"out_of_sample\")\ntib\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2 × 5\n mod1 mod2 mod3 mod4 method \n <dbl> <dbl> <dbl> <dbl> <chr> \n1 0.0159 0.0161 0.0164 0.0156 in_sample \n2 0.0158 0.0161 0.0165 0.0161 out_of_sample\n```\n:::\n:::\n",
"supporting": [
"00-cv-for-many-models_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {
"include-after-body": [
"\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n"
]
},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
314 changes: 0 additions & 314 deletions schedule/slides/00-cv-for-many-models.html

This file was deleted.

Loading

0 comments on commit 9bcc6be

Please sign in to comment.