This repository contains part of resources in 4th place solution (bilzard part) on the competition HMS - Harmful Brain Activity Classification
Details of this solution is described here.
Apache 2.0
- Graphic card: Nvidia RTX4090
- CUDA: 12.1
pip install -r requirements.txt
Below command installs this repository in your local python environment with dependent packages.
pip install --editable .
You should edit conf/env/local.yaml
to meet configuration of your local environment.
name: local
num_workers: 24
infer_batch_size: 32
grad_checkpointing: false
data_dir: (path to competition data)
working_dir: (path to your working directory)
output_dir: ${env.working_dir}/${job_name}/${phase}
checkpoint_dir: ${env.working_dir}/train
submission_dir: .
To reproduce final submission, execute the following commands:
python -m run.preprocess job_name=preprocess phase=train
python -m run.fold_split job_name=fold_split phase=train
python schedule.py train --config_names=v5_eeg_24ep_cutmix --folds=0,1,2,3,4 --seeds=0,1,2
These commands will train the models which is contained in the ensemble of final submissions. Trained model checkpoints will saved in ./data/train
.
Training is executed per each fold and random seed, so this generates the 15 model (5-folds and 3-seeds) checkpoints per config_name
.
These are the seed & fold ensemble result per each experiments.
exp_name | CV(n_votes>8.4) | Private LB | Public LB |
---|---|---|---|
v5_eeg_24ep_cutmix | 0.2477 | 0.327657 | 0.256772 |
Usage of major entry points in this repository is explained in this section. You don't have to fully understand the details to reproduce the final submission, but it will help you to understand how to train & evaluate new models using resources in this repository.
This command produces EEGs, Channel Quality Masks(CQM) and Kaggle spectrograms in numpy's ndarray format. By default EEGs are sub-sampled with 40Hz.
python -m run.preprocess job_name=preprocess phase=train
This command generates 5-fold train/validation splits. Each folds are generated by GroupKFold group by patient_id
.
python -m run.fold_split job_name=fold_split phase=train
This is useful to train multiple single models at the same time.
python schedule.py train --config_names=v5_eeg_24ep_cutmix --folds=0,1,2,3,4 --seeds=0,1,2
This is useful to make inference and mean ensemble with all the model specified in the specified ensemble entities at the same time.
It makes mean ensemble of all models specified in ensemble_entity (see run/conf/ensemble_entity
), and outputs submission.csv
to working directory.
python -m run.batch_infer job_name=ensemble ensemble_entity=f01234_s012 ensemble_entity.name=v5_eeg_24ep_cutmix
The architecture of the resources in this repository was inspired by the following repositories. Thanks to the authors.