Master issue: Important next steps #225

MDijsselhof · 2024-05-01T12:37:54Z

To be sure we continue with the very good pace we have, I've identified some key issues. Resolving them would hopefully expedite development and the research, and make it easier and more clear for everybody code/communication/result-wise. Please add to the list if we need other key objectives or ways to handle them!

Issue 1 -- Lack of consistent nomenclature
The names used for the MRI datasets, harmonisation algorithms, and output names differ between notebooks and are inconsistent with the nomenclature we decided upon and described in the papers below.

Prone to errors in:
- Processing data
- Analysing data
- Analysing results
- Creating figures, etc.
Troubles communication resulting in errors and time lost
Increases difficulty in achieving code modularity

Solution

Harmonise nomenclature across datasets, for example in branch 2final_results notebook ml_experiment3b-neurocomb.ipynb the second names of the used datasets are StrokeMRI, TOP, SABRE, Insight46 (and later to be added EDIS and HELIUS), however in the cell they are referred to as mri instead of StrokeMRI and TOPMRI (the combined StrokeMRI and TOP datasets) instead of StrokeMRI. The notebook information text at the top of the notebook is also inconsistant.
Datasets should be named as follows:
- StrokeMRI
- TOP
- HELIUS
- EDIS
- SABRE
- Insight46
- TOPStrokeMRI (combined STrokeMRI and TOP)
Harmonise nomenclature across algorihms, for example in branch 2final_results there are several notebooks (3b and 3e) with neuroharm in their name. Notebook 3e has 3 different algorithms in the title.
Algorithms should be named as follows:
- neuroCombat. Abbreviaton: NC
- OPNested ComBat or if preferred OPNestedCombat. Abbreviaton: OPNC
- AutoComBat . Abbreviaton: AC
- NeuroHarmony. Abbreviaton: NH
- CovBat. Abbreviaton: CovB
- CombatPlusPlus. Abbreviaton: CPP
- Unharmonised. Abbreviation: UH
Restructure output (folders) using nomenclature. Currently we save the output files in the same folder as the notebooks. I propose we create folders structured as output of harmonisation algorithms, which also count as input for Cerebrovascular Brain-age predictions and their results. Current output is coded in for example branch 2final_results notebook ml_experiment3b-neurocomb.ipynb in the final cells per dataset. This would mean a folder somewhere titled for example neuroCombat (or NonHarmonised for example) containing the following files:
- Input for harmonisation (either preprocessed or not)
  - StrokeMRI.csv
  - TOP.csv
  - etc.
- Output of Harmonisation (following DatasetHarmonised_DatasetsToWhichItHasBeenHarmonised_HarmonisationAlgorithmAbbreviation.csv)
  - SABRE_5way_OPNC.csv (for all datasets, using StrokeMRI and TOP merged as TOPStrokeMRI)
  - Insight46_SABRE_NC.csv
- Output of Cerebrovascular Brain-age estimations
  - Predictions in testing datasets (following y_test_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
    - y_test_SABRE_ExtraTrees_NC.csv
    - y_test_Insight46_LinearRegression_AC.csv
  - Predictions in testing datasets (following y_validate_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
    - y_validate_TOPStrokeMRI_ExtraTrees_NC.csv
    - y_validate_Insight46_LinearRegression_AC.csv

Issue 2 -- Code modularity/Amount of notebooks
The different notebooks with their specific analyses are a good start for writing the harmonisation paper, and as a basis for creating a unified script that handles data selection, harmonisation, Cerebrovascular Brain-age estimations, and visualising results. However, at the moment there are many copy-pastes of notebooks with very small changes between them, for different use cases, rather than iterating over a single notebook with a single different input argument. Next steps are to increase modularity of the code as quickly as possible to help with issue 1 and to avoid the following problems:

Prone to errors in:
- Writing the code
- Checking for bugs etc.
- Processing data
Difficult to synchronise and keep everything updated
Difficult to track changes
Increases effort/difficulties for future projects
- Adding other algorithms if necessary
- GUI

Solution
For starters, writing a global function for performing harmonisation with a selected algorithm, and Cerebrovascular Brain-age estimations with a selected algorithm would decrease problems writing above.

Issue 3 -- Code & Data organisation
Currently, many different notebooks are organised at several locations, as well as input for processing, harmonisation output, and Cerebrovascular Brain-age estimations. To streamline future processing, improving modularity, increase user-friendliness and transparency, and to avoid problems like:

Accidental processing issue in:

Processing data
Analysing data
Analysing results
Creating figures, etc.

Difficult in communication
Increases difficulty in code modularity

Solution
Preferably, all processing starts from ASL-BIDS folders. However, to ease code transitioning, it might be easier to have a structure within the CVASL repo that contains the following folders and output:

CVASL
- Environments
- Functions
- Notebooks
  - Harmonisation
  - Cerebrovascular Brain-age estimations
- Data
  - Original datasets
  - Harmonisation
    - Per harmonisation algorithm
  - Processing
    - Merged training dataset
    - Merged validation dataset
    - Merged testing dataset
  - Cerebrovascular Brain-age (containing all output of estimations)
    - Per algorithm

Other issues
Ultimately it would be nice to have the following steps separately in notebooks:

Harmonise datasets and create harmonised output
Define training/validation/testing datasets and merge datasets accordingly, to have one dataset of training/validaton/testing, while adding an identifier of the dataset to the dataframes.
Perform training/validation/testing
Produce evaluation metrics and result figures

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master issue: Important next steps #225

Master issue: Important next steps #225

MDijsselhof commented May 1, 2024 •

edited

Loading

Master issue: Important next steps #225

Master issue: Important next steps #225

Comments

MDijsselhof commented May 1, 2024 • edited Loading

MDijsselhof commented May 1, 2024 •

edited

Loading