Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master issue: Important next steps #225

Open
MDijsselhof opened this issue May 1, 2024 · 0 comments
Open

Master issue: Important next steps #225

MDijsselhof opened this issue May 1, 2024 · 0 comments

Comments

@MDijsselhof
Copy link
Collaborator

MDijsselhof commented May 1, 2024

To be sure we continue with the very good pace we have, I've identified some key issues. Resolving them would hopefully expedite development and the research, and make it easier and more clear for everybody code/communication/result-wise. Please add to the list if we need other key objectives or ways to handle them!

Issue 1 -- Lack of consistent nomenclature
The names used for the MRI datasets, harmonisation algorithms, and output names differ between notebooks and are inconsistent with the nomenclature we decided upon and described in the papers below.

  1. Prone to errors in:
    • Processing data
    • Analysing data
    • Analysing results
    • Creating figures, etc.
  2. Troubles communication resulting in errors and time lost
  3. Increases difficulty in achieving code modularity

Solution

  • Harmonise nomenclature across datasets, for example in branch 2final_results notebook ml_experiment3b-neurocomb.ipynb the second names of the used datasets are StrokeMRI, TOP, SABRE, Insight46 (and later to be added EDIS and HELIUS), however in the cell they are referred to as mri instead of StrokeMRI and TOPMRI (the combined StrokeMRI and TOP datasets) instead of StrokeMRI. The notebook information text at the top of the notebook is also inconsistant.
    Datasets should be named as follows:
    • StrokeMRI
    • TOP
    • HELIUS
    • EDIS
    • SABRE
    • Insight46
    • TOPStrokeMRI (combined STrokeMRI and TOP)
  • Harmonise nomenclature across algorihms, for example in branch 2final_results there are several notebooks (3b and 3e) with neuroharm in their name. Notebook 3e has 3 different algorithms in the title.
    Algorithms should be named as follows:
  • Restructure output (folders) using nomenclature. Currently we save the output files in the same folder as the notebooks. I propose we create folders structured as output of harmonisation algorithms, which also count as input for Cerebrovascular Brain-age predictions and their results. Current output is coded in for example branch 2final_results notebook ml_experiment3b-neurocomb.ipynb in the final cells per dataset. This would mean a folder somewhere titled for example neuroCombat (or NonHarmonised for example) containing the following files:
    • Input for harmonisation (either preprocessed or not)
      • StrokeMRI.csv
      • TOP.csv
      • etc.
    • Output of Harmonisation (following DatasetHarmonised_DatasetsToWhichItHasBeenHarmonised_HarmonisationAlgorithmAbbreviation.csv)
      • SABRE_5way_OPNC.csv (for all datasets, using StrokeMRI and TOP merged as TOPStrokeMRI)
      • Insight46_SABRE_NC.csv
    • Output of Cerebrovascular Brain-age estimations
      • Predictions in testing datasets (following y_test_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
        • y_test_SABRE_ExtraTrees_NC.csv
        • y_test_Insight46_LinearRegression_AC.csv
      • Predictions in testing datasets (following y_validate_dataset_PredictionAlgorithm_HarmonisationAlgorithm.csv
        • y_validate_TOPStrokeMRI_ExtraTrees_NC.csv
        • y_validate_Insight46_LinearRegression_AC.csv

Issue 2 -- Code modularity/Amount of notebooks
The different notebooks with their specific analyses are a good start for writing the harmonisation paper, and as a basis for creating a unified script that handles data selection, harmonisation, Cerebrovascular Brain-age estimations, and visualising results. However, at the moment there are many copy-pastes of notebooks with very small changes between them, for different use cases, rather than iterating over a single notebook with a single different input argument. Next steps are to increase modularity of the code as quickly as possible to help with issue 1 and to avoid the following problems:

  1. Prone to errors in:
    • Writing the code
    • Checking for bugs etc.
    • Processing data
  2. Difficult to synchronise and keep everything updated
  3. Difficult to track changes
  4. Increases effort/difficulties for future projects
    • Adding other algorithms if necessary
    • GUI

Solution
For starters, writing a global function for performing harmonisation with a selected algorithm, and Cerebrovascular Brain-age estimations with a selected algorithm would decrease problems writing above.

Issue 3 -- Code & Data organisation
Currently, many different notebooks are organised at several locations, as well as input for processing, harmonisation output, and Cerebrovascular Brain-age estimations. To streamline future processing, improving modularity, increase user-friendliness and transparency, and to avoid problems like:

  1. Accidental processing issue in:
  • Processing data
  • Analysing data
  • Analysing results
  • Creating figures, etc.
  1. Difficult in communication
  2. Increases difficulty in code modularity

Solution
Preferably, all processing starts from ASL-BIDS folders. However, to ease code transitioning, it might be easier to have a structure within the CVASL repo that contains the following folders and output:

  • CVASL
    • Environments
    • Functions
    • Notebooks
      • Harmonisation
      • Cerebrovascular Brain-age estimations
    • Data
      • Original datasets
      • Harmonisation
        • Per harmonisation algorithm
      • Processing
        • Merged training dataset
        • Merged validation dataset
        • Merged testing dataset
      • Cerebrovascular Brain-age (containing all output of estimations)
        • Per algorithm

Other issues
Ultimately it would be nice to have the following steps separately in notebooks:

  • Harmonise datasets and create harmonised output
  • Define training/validation/testing datasets and merge datasets accordingly, to have one dataset of training/validaton/testing, while adding an identifier of the dataset to the dataframes.
  • Perform training/validation/testing
  • Produce evaluation metrics and result figures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant