-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/UBC-STAT/stat-406
- Loading branch information
Showing
3 changed files
with
47 additions
and
356 deletions.
There are no files selected for viewing
20 changes: 20 additions & 0 deletions
20
_freeze/schedule/slides/00-cv-for-many-models/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"hash": "ce81d559001112d6fd73786eb2f4d192", | ||
"result": { | ||
"markdown": "---\nlecture: \"00 CV for many models\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 19 September 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## Some data and 4 models\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndata(\"mobility\", package = \"Stat406\")\n```\n:::\n\n\n**Model 1:** Lasso on all predictors, use CV min\n\n**Model 2:** Ridge on all predictors, use CV min\n\n**Model 3:** OLS on all predictors (no tuning parameters)\n\n**Model 4:** (1) Lasso on all predictors, then (2) OLS on those chosen at CV min\n\n\n> How do I decide between these 4 models?\n\n\n## CV functions\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nkfold_cv <- function(data, estimator, predictor, error_fun, kfolds = 5) {\n fold_labels <- sample(rep(seq_len(kfolds), length.out = nrow(data)))\n errors <- double(kfolds)\n for (fold in seq_len(kfolds)) {\n test_rows <- fold_labels == fold\n train <- data[!test_rows, ]\n test <- data[test_rows, ]\n current_model <- estimator(train)\n test$.preds <- predictor(current_model, test)\n errors[fold] <- error_fun(test)\n }\n mean(errors)\n}\n\nloo_cv <- function(dat) {\n mdl <- lm(Mobility ~ ., data = dat)\n mean( abs(residuals(mdl)) / abs(1 - hatvalues(mdl)) ) # MAE version\n}\n```\n:::\n\n\n\n## Experiment setup\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# prepare our data\n# note that mob has only continuous predictors, otherwise could be trouble\nmob <- mobility[complete.cases(mobility), ] |> select(-ID, -State, -Name)\n# avoid doing this same operation a bunch\nxmat <- function(dat) dat |> select(!Mobility) |> as.matrix()\n\n# set up our model functions\nlibrary(glmnet)\nmod1 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, type.measure = \"mae\", ...)\nmod2 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, alpha = 0, type.measure = \"mae\", ...)\nmod3 <- function(dat, ...) glmnet(xmat(dat), dat$Mobility, lambda = 0, ...) # just does lm()\nmod4 <- function(dat, ...) cv.glmnet(xmat(dat), dat$Mobility, relax = TRUE, gamma = 1, type.measure = \"mae\", ...)\n\n# this will still \"work\" on mod3, because there's only 1 s\npredictor <- function(mod, dat) drop(predict(mod, newx = xmat(dat), s = \"lambda.min\"))\n\n# chose mean absolute error just 'cause\nerror_fun <- function(testdata) mean(abs(testdata$Mobility - testdata$.preds))\n```\n:::\n\n\n\n## Run the experiment\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nall_model_funs <- lst(mod1, mod2, mod3, mod4)\nall_fits <- map(all_model_funs, .f = exec, dat = mob)\n\n# unfortunately, does different splits for each method, so we use 10, \n# it would be better to use the _SAME_ splits\nten_fold_cv <- map_dbl(all_model_funs, ~ kfold_cv(mob, .x, predictor, error_fun, 10)) \n\nin_sample_cv <- c(\n mod1 = min(all_fits[[1]]$cvm),\n mod2 = min(all_fits[[2]]$cvm),\n mod3 = loo_cv(mob),\n mod4 = min(all_fits[[4]]$cvm)\n)\n\ntib <- bind_rows(in_sample_cv, ten_fold_cv)\ntib$method = c(\"in_sample\", \"out_of_sample\")\ntib\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2 × 5\n mod1 mod2 mod3 mod4 method \n <dbl> <dbl> <dbl> <dbl> <chr> \n1 0.0159 0.0161 0.0164 0.0156 in_sample \n2 0.0158 0.0161 0.0165 0.0161 out_of_sample\n```\n:::\n:::\n", | ||
"supporting": [ | ||
"00-cv-for-many-models_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": { | ||
"include-after-body": [ | ||
"\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n" | ||
] | ||
}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.