The aim of this repo is to benchmark ML survival models (available via mlr3proba) on the TCGA PAAD dataset from the PanCancer Atlas project.
- TCGA data download and filter: tcga_paad.R
- Data preprocessing: preprocessing.R
- The scripts directory has several benchmarks, with some output results stored and the most important produced plots.
The most important scripts/investigations are the following:
- Benchmark CoxNet, Survival Trees and Survival Forests using nested-CV - script
- Tuning strategy investigation (Random search vs Bayesian Optimization) using CoxNet and Survival Forests - script
- XGBoost survival learner performance on mRNA dataset - script
- CoxPH baseline performance using clinical features and several resampling strategies - script
- CoxBoost (mRNA only and mRNA + clinical) vs CoxPH (clinical) - script
- Glmboost survival learner performance on mRNA dataset - script
- Wrapper-based Ensemble Feature Selection (eFS) per data modality - see script for mRNA data
- Task powerset benchmark after eFS is applied (using simple CoxPH or multiple learners)