You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like a variable included in the recipe step to be ignored when actually fitting a model (whether it is present in the data or not). In my understanding, that's one use of the update_role_requirements function, but it's not working as expected.
Reproducible example
library(tidymodels)
library(forcats)
set.seed(42)
# Create full dataset that does not yet have the stratifying variabledat<-starwars %>%
drop_na(gender) %>%
mutate(human= if_else(species=="Human", "human", "non-human"),
across(c(where(is.character)), factor),
across(mass, ~ if_else(.x>500, NA_real_, .x))) %>%
select(name, gender, human, height, mass)
# Split into training and testing sets stratified on a new variabletrain_test_split<-dat %>%
mutate(gender_by_human= paste0(gender, "|", human)) %>%
initial_split(prop=3/4, strata=gender_by_human)
# Create workflowrec<- recipe(gender~., data= training(train_test_split)) %>%
# Change role of stratifying variable (and ID) to "other"
update_role(c(name, gender_by_human), new_role="other") %>%
# Ignore the stratifying variable when baking:
update_role_requirements(role="other", bake=FALSE) %>%
step_impute_knn(all_predictors()) %>%
step_dummy(all_nominal_predictors())
spec<- logistic_reg() %>%
set_mode("classification")
wf<- workflow() %>%
add_recipe(rec) %>%
add_model(spec)
# Assess performance on test setwf %>% last_fit(train_test_split) %>% collect_metrics()
#> # A tibble: 2 × 4#> .metric .estimator .estimate .config#> <chr> <chr> <dbl> <chr>#> 1 accuracy binary 0.818 Preprocessor1_Model1#> 2 roc_auc binary 0.865 Preprocessor1_Model1# Attempt to fit model on the full datasetwf %>% fit(data=dat)
#> Error in `check_training_set()`:#> ! Not all variables in the recipe are present in the supplied training set: 'gender_by_human'.#> Backtrace:#> ▆#> 1. ├─wf %>% fit(data = dat)#> 2. ├─generics::fit(., data = dat)#> 3. └─workflows:::fit.workflow(., data = dat)#> 4. └─workflows::.fit_pre(workflow, data)#> 5. ├─generics::fit(action, workflow = workflow, data = data)#> 6. └─workflows:::fit.action_recipe(action, workflow = workflow, data = data)#> 7. ├─hardhat::mold(recipe, data, blueprint = blueprint)#> 8. └─hardhat:::mold.recipe(recipe, data, blueprint = blueprint)#> 9. ├─hardhat::run_mold(blueprint, data = data)#> 10. └─hardhat:::run_mold.default_recipe_blueprint(blueprint, data = data)#> 11. └─hardhat:::mold_recipe_default_process(...)#> 12. ├─recipes::prep(...)#> 13. └─recipes:::prep.recipe(...)#> 14. └─recipes:::check_training_set(training, x, fresh)#> 15. └─rlang::abort(...)
sessionInfo()
#> R version 4.2.3 (2023-03-15 ucrt)#> Platform: x86_64-w64-mingw32/x64 (64-bit)#> Running under: Windows 10 x64 (build 19044)#>#> Matrix products: default#>#> locale:#> [1] LC_COLLATE=English_United States.utf8#> [2] LC_CTYPE=English_United States.utf8#> [3] LC_MONETARY=English_United States.utf8#> [4] LC_NUMERIC=C#> [5] LC_TIME=English_United States.utf8#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> other attached packages:#> [1] forcats_0.5.2 yardstick_1.2.0 workflowsets_1.0.1 workflows_1.1.3#> [5] tune_1.1.2 tidyr_1.3.0 tibble_3.2.1 rsample_1.2.0#> [9] recipes_1.0.8 purrr_1.0.2 parsnip_1.1.1 modeldata_1.2.0#> [13] infer_1.0.4 ggplot2_3.4.3 dplyr_1.1.2 dials_1.2.0#> [17] scales_1.2.1 broom_1.0.5 tidymodels_1.1.1#>#> loaded via a namespace (and not attached):#> [1] foreach_1.5.2 splines_4.2.3 R.utils_2.12.2#> [4] prodlim_2019.11.13 GPfit_1.0-8 yaml_2.3.7#> [7] globals_0.16.2 ipred_0.9-13 pillar_1.9.0#> [10] backports_1.4.1 lattice_0.20-45 glue_1.6.2#> [13] digest_0.6.31 hardhat_1.3.0 colorspace_2.1-0#> [16] htmltools_0.5.4 Matrix_1.5-3 R.oo_1.25.0#> [19] timeDate_4022.108 pkgconfig_2.0.3 lhs_1.1.6#> [22] DiceDesign_1.9 listenv_0.9.0 gower_1.0.1#> [25] lava_1.7.1 timechange_0.2.0 styler_1.9.1#> [28] generics_0.1.3 ellipsis_0.3.2 withr_2.5.0#> [31] furrr_0.3.1 nnet_7.3-18 cli_3.6.1#> [34] survival_3.5-0 magrittr_2.0.3 evaluate_0.20#> [37] R.methodsS3_1.8.2 fs_1.6.1 future_1.30.0#> [40] fansi_1.0.4 parallelly_1.34.0 R.cache_0.16.0#> [43] MASS_7.3-58.2 class_7.3-21 tools_4.2.3#> [46] lifecycle_1.0.3 munsell_0.5.0 reprex_2.0.2#> [49] compiler_4.2.3 rlang_1.1.1 grid_4.2.3#> [52] iterators_1.0.14 rstudioapi_0.15.0 rmarkdown_2.20#> [55] gtable_0.3.3 codetools_0.2-19 R6_2.5.1#> [58] lubridate_1.9.1 knitr_1.42 fastmap_1.1.0#> [61] future.apply_1.10.0 utf8_1.2.3 parallel_4.2.3#> [64] Rcpp_1.0.10 vctrs_0.6.3 rpart_4.1.19#> [67] tidyselect_1.2.0 xfun_0.36
The text was updated successfully, but these errors were encountered:
This is happening because update_role_requirements() only changes the role requirements for bake() time. When you call wf %>% fit(data = dat) you need to prep() the recipe, where gender_by_human is needed, but isn't part of dat.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
The problem
I'd like a variable included in the recipe step to be ignored when actually fitting a model (whether it is present in the data or not). In my understanding, that's one use of the
update_role_requirements
function, but it's not working as expected.Reproducible example
The text was updated successfully, but these errors were encountered: