bblodfon / paad-survival-bench Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Benchmark survival ML models against a multimodal TCGA dataset

3 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
data		data
img		img
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
paad-survival-bench.Rproj		paad-survival-bench.Rproj

Repository files navigation

paad-survival-bench

The aim of this repo is to benchmark ML survival models (available via mlr3proba) on the TCGA PAAD dataset from the PanCancer Atlas project.

TCGA data download and filter: tcga_paad.R
Data preprocessing: preprocessing.R
The scripts directory has several benchmarks, with some output results stored and the most important produced plots. The most important scripts/investigations are the following:
- Benchmark CoxNet, Survival Trees and Survival Forests using nested-CV - script
- Tuning strategy investigation (Random search vs Bayesian Optimization) using CoxNet and Survival Forests - script
- XGBoost survival learner performance on mRNA dataset - script
- CoxPH baseline performance using clinical features and several resampling strategies - script
- CoxBoost (mRNA only and mRNA + clinical) vs CoxPH (clinical) - script
- Glmboost survival learner performance on mRNA dataset - script
- Wrapper-based Ensemble Feature Selection (eFS) per data modality - see script for mRNA data
- Task powerset benchmark after eFS is applied (using simple CoxPH or multiple learners)

About

Benchmark survival ML models against a multimodal TCGA dataset

benchmark tcga survival-prediction mlr3 curatedtcgadata mlr3proba

Report repository

Languages

R 100.0%