Domain-guided Self-supervision of EEG Data Improves Downstream Classification Performance and Generalizability
Authors: Neeraj Wagh, Jionghao Wei, Samarth Rawal, Brent Berry, Leland Barnard, Benjamin Brinkmann, Gregory Worrell, David Jones, Yogatheesan Varatharajah
Affiliation: University of Illinois at Urbana-Champaign, Mayo Clinic
- Paper: https://proceedings.mlr.press/v158/wagh21a.html
- ML4H Poster: https://drive.google.com/file/d/1-B7joEquzr4kqsfSGDqksUARWsxlXOyA/view?usp=sharing
- ML4H 5-minute Video: https://recorder-v3.slideslive.com/?share=56322&s=8950fb88-3f3f-4490-92ff-760b73e5a4b5
- ML4H Slides: https://drive.google.com/file/d/1uUma2e4GKIn8Zo7dEQRl_WkyVcBLzpaR/view?usp=sharing
- Final Models, Pre-computed Features, Training Metadata: Box
- Raw Data: MPI LEMON (no registration needed), TUH EEG Abnormal Corpus (needs registration)
- A dockerfile is provided with all Python dependencies.
- Models:
- FULLSSL in code refers to "HS+BSE+AC" in the paper.
- DBRTriplet refers to "BSE+AC".
- BTTriplet refers to "HS+AC".
- BTDBR refers to "HS+BSE".
- Triplet refers to "AC only".
- DBR refers to "BSE only".
- BT refers to "HS only".
- SOTA refers to "ShallowNet".
- Tasks:
- "EEG Grade" is the "condition" task with TUH dataset.
- "Eye State" is the "condition" task with Lemon dataset.
- Input choices for tasks: "condition", "gender", "age".
- Input choices for dataset: "lemon", "tuh".
- Input choices for ablation models: "FULLSSL", "DBRTriplet", "BTTriplet", "BTDBR", "Triplet", "DBR", "BT".
- Download resources folder, which contains pre-computed feature arrays(power spectral density, topographical maps data, and preprocessed timeseries), metadata from Box. Place it in the root directory of the project.
- Enter evaluation folder.
- To evaluate linear baseline, run the following command:
python linear_baseline_eval.py --dataset={dataset choice} --task={task choice}
- To evaluate proposed ablation models, run the following command:
python ablation_models_eval.py --gpu_idx={gpu index} --dataset={dataset choice} --task={task choice} --mode={ablation model choice}
- To evaluate SOTA model, run the following command:
python SOTA_eval.py --gpu_idx={gpu index} --dataset={dataset choice} --task={task choice}
- Download resources folder from Box. Place it in the root directory of the project.
- Enter Fine-tune folder.
- To fine-tune ablation models, run the following command:
python ablation_pipeline.py --gpu_idx={gpu index} --dataset={dataset choice} --task={task choice} --mode={ablation model choice}
- To fine-tune SOTA model, run the following command:
python SOTA_pipeline.py --gpu_idx={gpu index} --dataset={dataset choice} --task={task choice}
- Download resources folder from Box. Place it in the root directory of the project.
- Enter supervised_learning folder.
- To train the supervised learning models, run the following command:
python linear_baseline_train.py --dataset={dataset choice} --task={task choice}
- Issues regarding non-reproducibility of results or support with the codebase should be emailed to Neeraj and John.
- Neeraj: nwagh2@illinois.edu / Website / Twitter / Google Scholar
- John: wei33@illinois.edu
- Yoga: varatha2@illinois.edu / Website / Google Scholar
Wagh, N., Wei, J., Rawal, S., Berry, B., Barnard, L., Brinkmann, B., Worrell, G., Jones, D. & Varatharajah, Y.. (2021). Domain-guided Self-supervision of EEG Data Improves Downstream Classification Performance and Generalizability. Proceedings of Machine Learning for Health, in Proceedings of Machine Learning Research 158:130-142 Available from https://proceedings.mlr.press/v158/wagh21a.html.