Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

Commit

Permalink
feat: whether to change negative values to nan optional (#468)
Browse files Browse the repository at this point in the history
- [x] I have battle-tested on Overtaci (RMAPPS1279)
- [x] At least one of the commits is prefixed with either "fix:" or
"feat:"

## Notes for reviewers
changed one psycopmlutils import because tests couldn't import otherwise
  • Loading branch information
MartinBernstorff authored Mar 31, 2023
2 parents 2798867 + c81d882 commit 5c941d3
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
3 changes: 3 additions & 0 deletions src/psycop_model_training/config_schemas/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ class PreSplitPreprocessingConfigSchema(BaseModel):
convert_booleans_to_int: bool = False
# Whether to convert columns containing booleans to int

negative_values_to_nan: bool = True
# Whether to change negative values to NaN. Defaults to True since Chi2 cannot handle negative values. Can only be set to True if Chi2 is not used for feature selection.

drop_datetime_predictor_columns: bool = False
# Whether to drop datetime columns prefixed with data.pred_prefix.
# Typically, we don't want to use these as features, since they are unlikely to generalise into the future.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ def clean(self, dataset: pd.DataFrame) -> pd.DataFrame:
# In the future, we want to:
# 1a. See if there's a way of using feature selection that permits negative values, or
# 1b. Always use z-score normalisation?
dataset = self._negative_values_to_nan(dataset=dataset)
if self.pre_split_cfg.negative_values_to_nan:
dataset = self._negative_values_to_nan(dataset=dataset)
dataset = self.convert_timestamp_dtype_and_nat(dataset=dataset)

return dataset

0 comments on commit 5c941d3

Please sign in to comment.