-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding seed for reproducibility and sampling methods #344
Open
rwilfong
wants to merge
67
commits into
ATOMScience-org:1.7.0
Choose a base branch
from
rwilfong:1.7.0
base: 1.7.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
e4d8871
sampling and seed
rwilfong 22b0318
now it runs
stewarthe6 30ea360
kfold changes
stewarthe6 dc1f7c4
seed test
rwilfong 7b13967
ruff linter suggestions
rwilfong 6fb1c62
updated kfoldregression
rwilfong 480e5f1
Merge remote-tracking branch 'upstream/1.7.0' into 1.7.0
stewarthe6 fc24463
added imblearn to pip requirements
stewarthe6 561c3bb
unpin imblearn
stewarthe6 49dc67b
Clean up unused random_state or seed parameters or assignments.
stewarthe6 b41b7d5
fixed merging error
stewarthe6 b65ba09
Fixed find and replace bug
stewarthe6 84babd2
make_dc_model does not need random_state or seed arguments
stewarthe6 ecf23bd
fhnew changes
rwilfong a821f6a
Changed constructor of ProductionSplitter to call Splitting's init fu…
stewarthe6 319b2f0
resolving errors
rwilfong 31f3d5f
removed heads
rwilfong d074f65
removed unused library
rwilfong b0ecc05
Merge remote-tracking branch 'upstream/1.7.0' into 1.7.0
stewarthe6 2992bdf
Added more models for seeding test.
stewarthe6 ccebaed
Fixed seed for GCNModel. Should pass regularly now.
stewarthe6 dcc4809
Set seed to guarantee resuts in class_config_delaney_fit_nn_ecfp.json
stewarthe6 922bf0c
Moved 'test' from suffix to prefix
stewarthe6 82838d1
Renamed these test files to start with test_ so they're caught by the…
stewarthe6 4e471cb
Changed MultitaskScaffoldSplit and GeneticAlgorithm to use a Generate…
stewarthe6 baa5478
Added test for MTSS seed and fixed a few cases were the wrong random …
stewarthe6 4eb4ee4
renamed this file to match wahts in test_seed_splitting.py
stewarthe6 4588a9d
renamed this to match the test
stewarthe6 ff58d02
Removed try except blocks in test code. We need to see these errors
stewarthe6 0028ed7
Added seed to this test so that it passes more consistently
stewarthe6 0c83b6b
combined_training_data now accounts for synthetic datasets
stewarthe6 ada3ea8
accept changes
rwilfong 4dd5d99
integrate changes
rwilfong 0a616b2
set uncertainty false for classification test since it is unsupported…
stewarthe6 16c2a4a
update branchMerge branch '1.7.0' of https://github.com/rwilfong/AMPL…
rwilfong c3b1922
updated tests
rwilfong f2a30a9
resolve errors
rwilfong 410f03d
Added seed to test_balancing_transformer for more consistent outputs
stewarthe6 f247893
added a test to make sure that multitask problems don't work with SMOTE
stewarthe6 2e03fef
Used parameter to determine if SMOTE or undersampling is being used
stewarthe6 b48ed02
Added a seed to this test for more consistent results
stewarthe6 567264a
Changed balancing transformer to just check to see if the weights cha…
stewarthe6 627cc20
Set the seed to make sure the number of positive and negative compoun…
stewarthe6 8decc0e
Removed unnecessary loop and printed out results from the perf_data test
stewarthe6 317cc29
accumulate_preds ignores the id parameter for SimpleRegressionPerfDat…
stewarthe6 5055889
the positive and negative counts are inconsistent, instead just check…
stewarthe6 6d0abbd
Merge branch 'ATOMScience-org:1.7.0' into 1.7.0
stewarthe6 16d50f8
Undo transformations before calculating mean and std of predictions
stewarthe6 3e58819
Merge branch '1.7.0' of github.com:rwilfong/AMPL into 1.7.0
stewarthe6 0280941
Removed pdb imports
stewarthe6 a4c2b83
Updated help for 'seed' input
stewarthe6 8e29047
Removed commented out seed
stewarthe6 268ba05
model_retrian has an option to either keep or discard the saved seed.…
stewarthe6 17ba026
Pass on keep_seed argument
stewarthe6 b2a0c5a
Looping through all folds is redundant
stewarthe6 60ed670
Added option to keep the same random seed when retraining a model. De…
stewarthe6 c5e634f
Move common functions to integrative_utilities
stewarthe6 36c38ec
Move common functions to integrative_utilities
stewarthe6 d11ee2c
deleted unused imports
stewarthe6 a089d1f
moved params to json files
stewarthe6 271c502
Prevent divide by zero case if the model never learns
stewarthe6 48635cb
Moved pandas import over to integrative_utilities
stewarthe6 0c67471
Added a seed here for reproducability
stewarthe6 524d804
Testing SMOTE and balancing transformer
stewarthe6 a057ba4
global seed warning
rwilfong 58e101d
Merge branch '1.7.0' of https://github.com/rwilfong/AMPL into 1.7.0
rwilfong 10c4ba7
global seed warning
rwilfong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For undersampling, it looks like it assumes that K-fold undersampling would sample the entire non-test dataset. What if this isn't the case? Is this assumption ensured elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a compound can ever be wiped entirely out of existence due to undersampling. Undersampling is only applied to the training set of each fold.
And since every compound has a 'turn' in the validation set, that compound must appear at least once.
This isn't tested, but do we need to test it anywhere? I think it's ok if a compound is dropped entirely, since that's what happens when using undersampling without k-fold validation.