-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING CHANGE: loglik, AIC, BIC, feature-creep #1119
Conversation
* update docs for a fine granularity of weight columns * add deprecation warning and rename old weight role * implemented new weights in task * added weights everywhere, docs, tests * changed defaults * improve docs * rename role weights_train to weights_learner for consistency * fix printer test * improve docs * cleanup * better soft deprecate * fix typos * next try * hard deprecate weights of learner, all workarounds failed * adapt autotest * update news * cleanup * fix autotest * fix error message * change defaults * fix test * fix: properties * refactor: add use weight parameter in base learner * chore: deprecated * tests: col roles * fix: use weights * test: use weights * fix --------- Co-authored-by: be-marc <marcbecker@posteo.de>
Also needs to be removed from mlr3learners for log reg, multinom and linear regression. Or could just stay in the package with improved documentation. |
TBH, I'd really like to keep AIC/BIC in the packages. I show this during teaching together with a SFS and compare it to |
After a closer look, we could remove the properties and methods altogether but keep the measures for AIC and BIC. The user then has to ensure that |
The weights used during training by the Learner are renamed to `weights_learner`, the previous column role `weight` is dysfunctional. | ||
Additionally, it is now possible to disable the use of weights via the new hyperparameter `use_weights`. | ||
Note that this is a breaking change, but appears to be the less error-prone solution in the long run. | ||
* refactor: Deprecated `data_format` and `data_formats` for Learners, Tasks, and DataBackends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I see it here, I don't think "Deprecation" is a refactor?
@@ -1,24 +1,29 @@ | |||
# mlr3 (development version) | |||
|
|||
* Deprecated `data_format` and `data_formats` for Learners, Tasks, and DataBackends. | |||
* refactor: It is now possible to use weights also during scoring predictions via measures and during resampling to sample observations with unequal probability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a BREAKING_CHANGE and not a refactor I think
The weights used during training by the Learner are renamed to `weights_learner`, the previous column role `weight` is dysfunctional. | ||
Additionally, it is now possible to disable the use of weights via the new hyperparameter `use_weights`. | ||
Note that this is a breaking change, but appears to be the less error-prone solution in the long run. | ||
* refactor: Deprecated `data_format` and `data_formats` for Learners, Tasks, and DataBackends. | ||
* feat: The `partition()` function creates training, test and validation sets. | ||
* refactor: Optimize runtime of fixing factor levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be "perf" and not "refactor".
This is my template:
# * feat A new feature - SemVar PATCH
# * fix A bug fix - SemVar MINOR
# * BREAKING CHANGE Breaking API change - SemVar MAJOR
# * docs Change to documentation only
# * style Change to style (whitespace, etc.)
# * refactor Change not related to a bug or feat
# * perf Change that affects performance
# * test Change that adds/modifies tests
# * build Change to build system
# * ci Change to CI pipeline/workflow
# * chore General tooling/config/min refactor
#' @section Weights: | ||
#' | ||
#' Many measures support observation weights, indicated by their property `"weights"`. | ||
#' The weights are stored in the [Task] where the column role `weights_measure` needs to be assigned to a single numeric column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' The weights are stored in the [Task] where the column role `weights_measure` needs to be assigned to a single numeric column. | |
#' The weights are stored in the [Task] where the column role `weights_measure` needs to be assigned to a single positive numeric column. |
#' | ||
#' `col_roles` is a named list whose elements are named by column role and each element is a `character()` vector of column names. | ||
#' To alter the roles, just modify the list, e.g. with \R's set functions ([intersect()], [setdiff()], [union()], \ldots). | ||
#' The method `$set_col_roles` provides a convenient alternative to assign columns to roles. | ||
#' | ||
#' The roles `weights_learner`, `weights_measure` and `weights_resampling` may only point to a single numeric column, but they can | ||
#' all point to the same column or different columns. Weights must be non-negative numerics with at least one weight being > 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weights_resampling usually require more than one weight being > 0
if (length(weight_cols) == 0L) { | ||
return(NULL) | ||
} | ||
data = self$backend$data(private$.row_roles$use, c(self$backend$primary_key, weight_cols)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fail because columns can be returned in arbitrary order:
Line 61 in 57b6109
#' Rows are guaranteed to be returned in the same order as `rows`, columns may be returned in an arbitrary order. |
Instead you should use the $data()
method of the task:
Line 265 in 57b6109
#' Rows and columns are returned in the order specified via the arguments `rows` and `cols`. |
assert_has_backend(self) | ||
assert_ro_binding(rhs) | ||
weight_cols = private$.col_roles$weight | ||
weight_cols = private$.col_roles[["weights_measure"]] | ||
if (length(weight_cols) == 0L) { | ||
return(NULL) | ||
} | ||
data = self$backend$data(private$.row_roles$use, c(self$backend$primary_key, weight_cols)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -1189,12 +1250,23 @@ task_set_roles = function(li, elements, roles = NULL, add_to = NULL, remove_from | |||
} | |||
|
|||
task_check_col_roles = function(self, new_roles) { | |||
for (role in c("group", "weight", "name")) { | |||
if ("weight" %in% names(new_roles)) { | |||
stopf("Task role 'weight' is deprecated, use 'weights_learner' instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we throw an error here I don't think "deprecated" is the right word? It is already not supported anymore?
# enable weights | ||
if ("weights" %in% learner$properties) { | ||
# learner must have flag "use_weights" | ||
msg = checkmate::check_subset("use_weights", learner$param_set$ids(), empty.ok = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = checkmate::check_subset("use_weights", learner$param_set$ids(), empty.ok = FALSE) | |
msg = checkmate::check_subset("use_weights", learner$param_set$ids()) |
One more comment about the |
Otoh the question is then whether we really need this because we can already set the |
@mllg. this is fine I HOPE, IF IF IF we not implement any further stuff for this. |
Moved to #1207 |
WARNING: BEFORE WE MERGE:
aic and bic should be removed from book, loglik too
check mlr3learners
mlr3tuning autotuner
add to NEWS