UK - Biobank New BIDS dataset #29

mpompolas · 2021-01-22T17:46:46Z

ULTIMATE GOAL - Create a new REPO of UK-BioBank

For the purpose of this new BIDS dataset, we want to keep the final preprocessed files, and the derivatives that correspond to them (a gradient-corrected scan has a different segmentation than the original).

The new BIDS folder should appear as an identical copy of UK-Biobank (same number of files AND same LABELS) but within a different folder name: e.g. UK_BioBank_processed, and also have the derivatives that were manually checked.

BEFORE MANUAL CHECK

Sandrine's pipeline seems ready to go.
At this stage, I suggest we keep all the intermediate files for easy identification of potential problems. If space becomes an issue on Joplin we reevaluate: maybe do it in batches.

AFTER MANUAL CHECK

We should have files within the /UK_BioBank_processed/derivatives folder. Labels should be without RPI,gradcorr etc. suffixes, so on your code when you add the suffix _manual, make sure your strip those off.

Regarding the anatomy files (not the derivatives), we want to keep the last file of the pre-processing only, with the same name as the original:
e.g. Instead of: sub-1000252_T2w_RPI_r_gradcorr.nii.gz it should be sub-1000252_T2w.nii.

This will make things very easy for later processing through the Ivadomed pipeline.
So to sum it up:

Rename the reoriented/resampled file to what the original was,
Delete the rest of the processing files *RPI, *RPI_r_gradcorr etc..

NOTES

A few more files are needed for a complete BIDS folder: dataset_description.json and participants.json (you only have participants.tsv) - Maybe a README.TXT as well(?). Just copy these from the original UK-BioBank dataset.

The preprocessing steps should be documented somewhere: The easiest place would in the dataset_description.
Document git-version of SpinalCordToolbox and the function calls that were used with their parameters.
Another place could be the .json that is associated to each .nii.gz but that is a bit more work.
There is also the gradcorr file that needs to be documented somehow.... Don't have any input on that. As a start, maybe document which facility it came from(?)

The text was updated successfully, but these errors were encountered:

jcohenadad · 2021-01-22T18:51:29Z

thank you for initiating this @mpompolas, few precisions:

the repos should be under git-annex (not on duke)
the repos name should be the same as the original repos (unprocessed) with added suffix: -processed

sandrinebedard · 2021-01-22T19:43:59Z

We should have files within the /UK_BioBank_processed/derivatives folder. Labels should be without RPI,gradcorr etc. suffixes, so on your code when you add the suffix _manual, make sure your strip those off.

@mpompolas So I can add to this branch a modified version of my script for manual corrections manual_correction.py so the output name of manual correction would be for example sub-1000032_T1w_seg-manual.nii.gz instead of sub-1000032_T1w_RPI_r_gradcorr_seg-manual.nii.gz directly, is that right?

mpompolas · 2021-01-23T00:07:25Z

the repos should be under git-annex (not on duke)
the repos name should be the same as the original repos (unprocessed) with added suffix: -processed

Thanks @jcohenadad , just edited my instructions.

So I can add to this branch a modified version of my script for manual corrections manual_correction.py so the output name of manual correction would be for example sub-1000032_T1w_seg-manual.nii.gz instead of sub-1000032_T1w_RPI_r_gradcorr_seg-manual.nii.gz directly, is that right?

@sandrinebedard exactly. For creating this dataset, we will solely use code from this branch.

sandrinebedard · 2021-01-25T19:57:02Z

I had some thoughts about the datasets we want to create. We talked about the fact that the derivatives folder would only be in the UK_BioBank_processed dataset. However, my pipeline for cord CSA takes as an input the raw images and also manual segmentation and disc label in the derivatives. So there will be a problem if the derivatives are in UK_BioBank_processed.

ideas:

I could modify my process_data.sh to take in the new dataset, so removing steps of resampling, reorientation and gradcorr but we would have to create the dataset before I can run my pipeline
Would it be possible to have the same derivatives folder associated to both datasets or something like that?

@jcohenadad do you have some thoughts on this?

jcohenadad · 2021-01-25T20:25:06Z

@sandrinebedard good point.

I could modify my process_data.sh to take in the new dataset, so removing steps of resampling, reorientation and gradcorr but we would have to create the dataset before I can run my pipeline

I would lean towards this approach. You could e.g. break down your shell script and create a preprocess_data.sh, which deals with gradcorr, resampling. That script could also deal with renaming (ie remove the suffix "_gradcorr_r" as we discussed, so that the output data is "clean" of suffix and can be used as a "native" BIDS dataset for other projects (eg model training).

Would it be possible to have the same derivatives folder associated to both datasets or something like that?

I would advise against it. I'm afraid we will end up with out-of-sync derivatives (eg. segmentation manually corrected in dataset1 but we forgot to update it in dataset2).

mpompolas · 2021-01-25T23:43:16Z

You could e.g. break down your shell script and create a preprocess_data.sh, which deals with gradcorr, resampling. That script could also deal with renaming (ie remove the suffix "_gradcorr_r" as we discussed, so that the output data is "clean" of suffix and can be used as a "native" BIDS dataset for other projects (eg model training).

I agree with @jcohenadad on splitting the script into two parts.

Would it be possible to have the same derivatives folder associated to both datasets or something like that?

The idea is to completely separate the original from the preprocessed dataset. If we put segmentations within the same folder from multiple datasets (I assume you would differentiate them with a suffix) it will become complicated later on to differentiate which ones we will use for training since we tend to have a standardized suffix in all datasets "_seg-manual" or "_labels-disk-manual" etc.
This standardization will make things very easy when we need to select multiple Datasets/BIDS folders as inputs in training.

sandrinebedard · 2021-01-26T01:56:57Z

I would lean towards this approach. You could e.g. break down your shell script and create a preprocess_data.sh, which deals with gradcorr, resampling. That script could also deal with renaming (ie remove the suffix "_gradcorr_r" as we discussed, so that the output data is "clean" of suffix and can be used as a "native" BIDS dataset for other projects (eg model training).

@jcohenadad @mpompolas I agree, splitting the script seems like the best idea, I will get into it!

This was referenced May 19, 2021

Training on spinegeneric dataset gives size mismatches between image and ground truth ivadomed/ivadomed#797

Open

Create a processed version of the dataset spine-generic/data-multi-subject#90

Closed

sandrinebedard mentioned this issue Jun 23, 2022

Move derivatives from spine-generic-processed back to data-multi-subject spine-generic/data-multi-subject#121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UK - Biobank New BIDS dataset #29

UK - Biobank New BIDS dataset #29

mpompolas commented Jan 22, 2021 •

edited

Loading

jcohenadad commented Jan 22, 2021

sandrinebedard commented Jan 22, 2021

mpompolas commented Jan 23, 2021

sandrinebedard commented Jan 25, 2021

jcohenadad commented Jan 25, 2021

mpompolas commented Jan 25, 2021

sandrinebedard commented Jan 26, 2021

UK - Biobank New BIDS dataset #29

UK - Biobank New BIDS dataset #29

Comments

mpompolas commented Jan 22, 2021 • edited Loading

ULTIMATE GOAL - Create a new REPO of UK-BioBank

BEFORE MANUAL CHECK

AFTER MANUAL CHECK

NOTES

jcohenadad commented Jan 22, 2021

sandrinebedard commented Jan 22, 2021

mpompolas commented Jan 23, 2021

sandrinebedard commented Jan 25, 2021

jcohenadad commented Jan 25, 2021

mpompolas commented Jan 25, 2021

sandrinebedard commented Jan 26, 2021

mpompolas commented Jan 22, 2021 •

edited

Loading