Validation data generation, removal of loss function overrides #234

kaueltzen · 2024-10-28T09:20:21Z

Resolves #228 .

Overview

Replaces train_test_split with explicit generation of validation data that is reproducible, shuffled and stratified (if classification tasks are present)
- as in Fix bootstrap fixed random state #232 no explicit determination of classification targets for stratification, uses first target instead
Removes override of loss function for classification tasks in fit of MODNetModel

…nction.

…t target instead.

kaueltzen · 2024-11-19T13:56:55Z

Hi @ppdebreuck I have a question regarding this comment:
#228 (comment)

Do you think it would make sense to enable the passing of a loss function for classification tasks in evaluate of MODNetModel?
Or would you prefer keeping -ROC-AUC (e.g., because it's easier to interpret than cross entropy losses)?

ppdebreuck

Thanks @kaueltzen ! Looks good, there just might be something off in the the stratified split, please let me know if you agree.

ppdebreuck · 2024-12-18T10:52:10Z

modnet/models/vanilla.py

@@ -1534,3 +1549,46 @@ def validate_model(

 def map_validate_model(kwargs):
    return validate_model(**kwargs)
+
+
+def generate_shuffled_and_stratified_val_split(


I prefer to have this function under modnet.utils and import it as from modnet.utils import generate_shuffled_and_stratified_val_split. This will avoid having it in __all__

ppdebreuck · 2024-12-18T10:53:45Z

modnet/models/vanilla.py

+        if isinstance(y[0][0], list) or isinstance(y[0][0], np.ndarray):
+            ycv = np.argmax(y[0], axis=1)
+        else:
+            ycv = y[0]


This is most likely wrong. It's picking the first row, but we need the first column: ycv = y[:,0]

Hi @ppdebreuck thanks for pointing that out!
It was written for MODNetModel.fit() to handle a list of props from targets_groups, but outside of vanilla.py / generate_shuffled_and_stratified_val_data the targets are indeed not correctly handled, I will change it.

ppdebreuck · 2024-12-18T10:54:31Z

modnet/models/vanilla.py

+        else:
+            ycv = y[0]
+        return train_test_split(
+            range(len(y[0])),


this can simply become range(len(ycv))

Edit: let's do range(len(y)), so it's compatible with my last comment.

ppdebreuck · 2024-12-18T10:58:30Z

modnet/models/vanilla.py

+    """
+    if classification:
+        if isinstance(y[0][0], list) or isinstance(y[0][0], np.ndarray):
+            ycv = np.argmax(y[0], axis=1)


Stratifying a multilabel case is a bit tricky, and probably you don't need it ? So we can skip it: ycv=None

ppdebreuck · 2024-12-18T11:02:11Z

modnet/models/vanilla.py

+    )
+    return (
+        x[train_idx],
+        [t[train_idx] for t in y],


This doesn't seem correct ? y[train_idx] should work, right?. It would pick the wrong columns now.

The y of MODNetModel.fit() and of generate_shuffled_and_stratified_val_data has the first two dimensions (n_target_groups, n_samples) while the y of generate_shuffled_and_stratified_val_split that is used elsewhere (data.df_targets.values) has the first 2 dimensions (n_samples, n_targets).

ppdebreuck · 2024-12-18T11:13:56Z

Hi @ppdebreuck I have a question regarding this comment: #228 (comment)

Do you think it would make sense to enable the passing of a loss function for classification tasks in evaluate of MODNetModel? Or would you prefer keeping -ROC-AUC (e.g., because it's easier to interpret than cross entropy losses)?

No problem with me, as you can put ROC_AUC as default metric to keep current behavior, while adding flexibility if you need to change it :) (I would put it in a separate PR)

…rected axes of y in generate_shuffled_and_stratified_val_split and generate_shuffled_and_stratified_val_data

ppdebreuck

Thanks for the modifications @kaueltzen, I think this can be merged ?

kaueltzen and others added 8 commits October 25, 2024 13:41

Removed resetting of loss in fit of ModnetModel.

ba5be8d

Added function for creating shuffled, (stratified) validation data.

235d2ea

Replaced train_test_split in fit_preset and FitGenetic with custom fu…

8ad5ddd

…nction.

Added custom validation data generation to DeprecaedMODNetModel.

2c21ce2

Added stratification column index.

2f9fcd6

Merge branch 'ppdebreuck:master' into splitting

b5388a6

Removed determination of target for stratification, using always firs…

61a8821

…t target instead.

Type changes

eaa5f6e

kaueltzen changed the title ~~[WIP] Validation data generation, removal of loss function overrides~~ Validation data generation, removal of loss function overrides Nov 19, 2024

kaueltzen marked this pull request as ready for review November 19, 2024 13:52

ppdebreuck requested changes Dec 18, 2024

View reviewed changes

kaueltzen and others added 2 commits December 18, 2024 17:34

Moved generate_shuffled_and_stratified_val_split to modnet.utils, cor…

0b9cbd3

…rected axes of y in generate_shuffled_and_stratified_val_split and generate_shuffled_and_stratified_val_data

Merge branch 'master' into splitting

0d11052

ppdebreuck approved these changes Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation data generation, removal of loss function overrides #234

Validation data generation, removal of loss function overrides #234

kaueltzen commented Oct 28, 2024 •

edited

Loading

kaueltzen commented Nov 19, 2024

ppdebreuck left a comment

ppdebreuck Dec 18, 2024

ppdebreuck Dec 18, 2024

kaueltzen Dec 18, 2024

ppdebreuck Dec 18, 2024

ppdebreuck Dec 18, 2024

ppdebreuck Dec 18, 2024

ppdebreuck Dec 18, 2024

kaueltzen Dec 18, 2024

ppdebreuck commented Dec 18, 2024

ppdebreuck left a comment

Validation data generation, removal of loss function overrides #234

Are you sure you want to change the base?

Validation data generation, removal of loss function overrides #234

Conversation

kaueltzen commented Oct 28, 2024 • edited Loading

Overview

kaueltzen commented Nov 19, 2024

ppdebreuck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppdebreuck commented Dec 18, 2024

ppdebreuck left a comment

Choose a reason for hiding this comment

kaueltzen commented Oct 28, 2024 •

edited

Loading