You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To be sure we continue with the very good pace we have, I've identified some key issues. Resolving them would hopefully expedite development and the research, and make it easier and more clear for everybody code/communication/result-wise. Please add to the list if we need other key objectives or ways to handle them!
Issue 1 -- Lack of consistent nomenclature
The names used for the MRI datasets, harmonisation algorithms, and output names differ between notebooks and are inconsistent with the nomenclature we decided upon and described in the papers below.
Prone to errors in:
Processing data
Analysing data
Analysing results
Creating figures, etc.
Troubles communication resulting in errors and time lost
Increases difficulty in achieving code modularity
Solution
Harmonise nomenclature across datasets, for example in branch 2final_results notebook ml_experiment3b-neurocomb.ipynb the second names of the used datasets are StrokeMRI, TOP, SABRE, Insight46 (and later to be added EDIS and HELIUS), however in the cell they are referred to as mri instead of StrokeMRI and TOPMRI (the combined StrokeMRI and TOP datasets) instead of StrokeMRI. The notebook information text at the top of the notebook is also inconsistant.
Datasets should be named as follows:
StrokeMRI
TOP
HELIUS
EDIS
SABRE
Insight46
TOPStrokeMRI (combined STrokeMRI and TOP)
Harmonise nomenclature across algorihms, for example in branch 2final_results there are several notebooks (3b and 3e) with neuroharm in their name. Notebook 3e has 3 different algorithms in the title.
Algorithms should be named as follows:
Restructure output (folders) using nomenclature. Currently we save the output files in the same folder as the notebooks. I propose we create folders structured as output of harmonisation algorithms, which also count as input for Cerebrovascular Brain-age predictions and their results. Current output is coded in for example branch 2final_results notebook ml_experiment3b-neurocomb.ipynb in the final cells per dataset. This would mean a folder somewhere titled for example neuroCombat (or NonHarmonised for example) containing the following files:
Input for harmonisation (either preprocessed or not)
StrokeMRI.csv
TOP.csv
etc.
Output of Harmonisation (following DatasetHarmonised_DatasetsToWhichItHasBeenHarmonised_HarmonisationAlgorithmAbbreviation.csv)
SABRE_5way_OPNC.csv (for all datasets, using StrokeMRI and TOP merged as TOPStrokeMRI)
Insight46_SABRE_NC.csv
Output of Cerebrovascular Brain-age estimations
Predictions in testing datasets (following y_test_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
y_test_SABRE_ExtraTrees_NC.csv
y_test_Insight46_LinearRegression_AC.csv
Predictions in testing datasets (following y_validate_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
y_validate_TOPStrokeMRI_ExtraTrees_NC.csv
y_validate_Insight46_LinearRegression_AC.csv
Issue 2 -- Code modularity/Amount of notebooks
The different notebooks with their specific analyses are a good start for writing the harmonisation paper, and as a basis for creating a unified script that handles data selection, harmonisation, Cerebrovascular Brain-age estimations, and visualising results. However, at the moment there are many copy-pastes of notebooks with very small changes between them, for different use cases, rather than iterating over a single notebook with a single different input argument. Next steps are to increase modularity of the code as quickly as possible to help with issue 1 and to avoid the following problems:
Prone to errors in:
Writing the code
Checking for bugs etc.
Processing data
Difficult to synchronise and keep everything updated
Difficult to track changes
Increases effort/difficulties for future projects
Adding other algorithms if necessary
GUI
Solution
For starters, writing a global function for performing harmonisation with a selected algorithm, and Cerebrovascular Brain-age estimations with a selected algorithm would decrease problems writing above.
Issue 3 -- Code & Data organisation
Currently, many different notebooks are organised at several locations, as well as input for processing, harmonisation output, and Cerebrovascular Brain-age estimations. To streamline future processing, improving modularity, increase user-friendliness and transparency, and to avoid problems like:
Accidental processing issue in:
Processing data
Analysing data
Analysing results
Creating figures, etc.
Difficult in communication
Increases difficulty in code modularity
Solution
Preferably, all processing starts from ASL-BIDS folders. However, to ease code transitioning, it might be easier to have a structure within the CVASL repo that contains the following folders and output:
CVASL
Environments
Functions
Notebooks
Harmonisation
Cerebrovascular Brain-age estimations
Data
Original datasets
Harmonisation
Per harmonisation algorithm
Processing
Merged training dataset
Merged validation dataset
Merged testing dataset
Cerebrovascular Brain-age (containing all output of estimations)
Per algorithm
Other issues
Ultimately it would be nice to have the following steps separately in notebooks:
Harmonise datasets and create harmonised output
Define training/validation/testing datasets and merge datasets accordingly, to have one dataset of training/validaton/testing, while adding an identifier of the dataset to the dataframes.
Perform training/validation/testing
Produce evaluation metrics and result figures
The text was updated successfully, but these errors were encountered:
To be sure we continue with the very good pace we have, I've identified some key issues. Resolving them would hopefully expedite development and the research, and make it easier and more clear for everybody code/communication/result-wise. Please add to the list if we need other key objectives or ways to handle them!
Issue 1 -- Lack of consistent nomenclature
The names used for the MRI datasets, harmonisation algorithms, and output names differ between notebooks and are inconsistent with the nomenclature we decided upon and described in the papers below.
Solution
2final_results
notebookml_experiment3b-neurocomb.ipynb
the second names of the used datasets are StrokeMRI, TOP, SABRE, Insight46 (and later to be added EDIS and HELIUS), however in the cell they are referred to asmri
instead ofStrokeMRI
andTOPMRI
(the combined StrokeMRI and TOP datasets) instead ofStrokeMRI
. The notebook information text at the top of the notebook is also inconsistant.Datasets should be named as follows:
2final_results
there are several notebooks (3b and 3e) withneuroharm
in their name. Notebook 3e has 3 different algorithms in the title.Algorithms should be named as follows:
2final_results
notebookml_experiment3b-neurocomb.ipynb
in the final cells per dataset. This would mean a folder somewhere titled for exampleneuroCombat
(or NonHarmonised for example) containing the following files:Issue 2 -- Code modularity/Amount of notebooks
The different notebooks with their specific analyses are a good start for writing the harmonisation paper, and as a basis for creating a unified script that handles data selection, harmonisation, Cerebrovascular Brain-age estimations, and visualising results. However, at the moment there are many copy-pastes of notebooks with very small changes between them, for different use cases, rather than iterating over a single notebook with a single different input argument. Next steps are to increase modularity of the code as quickly as possible to help with issue 1 and to avoid the following problems:
Solution
For starters, writing a global function for performing harmonisation with a selected algorithm, and Cerebrovascular Brain-age estimations with a selected algorithm would decrease problems writing above.
Issue 3 -- Code & Data organisation
Currently, many different notebooks are organised at several locations, as well as input for processing, harmonisation output, and Cerebrovascular Brain-age estimations. To streamline future processing, improving modularity, increase user-friendliness and transparency, and to avoid problems like:
Solution
Preferably, all processing starts from ASL-BIDS folders. However, to ease code transitioning, it might be easier to have a structure within the CVASL repo that contains the following folders and output:
Other issues
Ultimately it would be nice to have the following steps separately in notebooks:
The text was updated successfully, but these errors were encountered: