Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using exports from mlr3fselect during multisession execution fails due to non-loaded package #1183

Closed
skysyzygy opened this issue Sep 27, 2024 · 4 comments

Comments

@skysyzygy
Copy link

mlr3fselect adds always_include to mlr_reflection$col_roles on package load.

This causes failures when training in with future::plan("multisession"), as the parallel workers don't seem to load mlr3fselect and complain about the existence of always_include in the col_roles for the task.

Error in .__Task__col_roles(self = self, private = private, super = super,  : 
  Assertion on 'names of col_roles' failed: Names must be a permutation of set {'feature','target','name','order','stratum','group','weight'}, but has extra elements {'always_included'}.
This happened PipeOp char_to_fct's $train()

Here is a working (sequential) MWE:

library(mlr3verse)
future::plan("sequential")

task <- tsk("zoo")
learner <- po("select") %>>% ppl("robustify") %>>% lrn("classif.rpart")
resample(task,learner,rsmp("cv"))

And a failing (multisession) MWE:

library(mlr3verse)
future::plan("multisession")

task <- tsk("zoo")
learner <- po("select") %>>% ppl("robustify") %>>% lrn("classif.rpart")
resample(task,learner,rsmp("cv"))

Here are the package versions I'm using

> mlr3verse::mlr3verse_info()
Key: <package>
             package version
              <char>  <char>
 1:            bbotk   1.0.1
 2:      mlr3cluster   0.1.9
 3:         mlr3data   0.7.0
 4:      mlr3filters   0.8.0
 5:      mlr3fselect   1.1.0
 6:    mlr3hyperband   0.6.0
 7:     mlr3learners   0.7.0
 8:          mlr3mbo   0.2.4
 9:         mlr3misc  0.15.1
10:    mlr3pipelines   0.7.0
11:       mlr3tuning   1.0.0
12: mlr3tuningspaces   0.5.1
13:          mlr3viz   0.9.0
14:          paradox   1.0.1
@be-marc
Copy link
Member

be-marc commented Oct 17, 2024

Hey, sorry for the late reply. Our team was on vacation. Thanks for reporting this bug. A workaround should be to load only the required packages.

library(mlr3)
library(mlr3pipelines)

@be-marc be-marc self-assigned this Oct 17, 2024
@skysyzygy
Copy link
Author

Thanks for getting back! It's actually happening in a package, i.e. without explicit imports so not sure how to implement this workaround?

From what I can gather the issue is that po("select") causes an import from mlr3fselect, which has an .onLoad that modifies col_roles. For some reason this isn't happening in future workers though?

@be-marc
Copy link
Member

be-marc commented Oct 18, 2024

Yes, it has something to do with it. When mlr3verse is loaded, mlr3fselect is also loaded which adds a new col_role. However, mlr3fselect does not appear in your workflow, which is why mlr3fselect is not loaded on the worker. There is then an error on the worker because the new col_role of the task is not known.

There is fix now. You can test the new versions with pak::pak(c("mlr-org/mlr3", "mlr-org/mlr3fselect")).

@skysyzygy
Copy link
Author

skysyzygy commented Oct 18, 2024 via email

@be-marc be-marc closed this as completed Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants