Skip to content

Benchmark survival ML models against a multimodal TCGA dataset

License

Notifications You must be signed in to change notification settings

bblodfon/paad-survival-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paad-survival-bench

The aim of this repo is to benchmark ML survival models (available via mlr3proba) on the TCGA PAAD dataset from the PanCancer Atlas project.

  • TCGA data download and filter: tcga_paad.R
  • Data preprocessing: preprocessing.R
  • The scripts directory has several benchmarks, with some output results stored and the most important produced plots. The most important scripts/investigations are the following:
    • Benchmark CoxNet, Survival Trees and Survival Forests using nested-CV - script
    • Tuning strategy investigation (Random search vs Bayesian Optimization) using CoxNet and Survival Forests - script
    • XGBoost survival learner performance on mRNA dataset - script
    • CoxPH baseline performance using clinical features and several resampling strategies - script
    • CoxBoost (mRNA only and mRNA + clinical) vs CoxPH (clinical) - script
    • Glmboost survival learner performance on mRNA dataset - script
    • Wrapper-based Ensemble Feature Selection (eFS) per data modality - see script for mRNA data
    • Task powerset benchmark after eFS is applied (using simple CoxPH or multiple learners)

About

Benchmark survival ML models against a multimodal TCGA dataset

Topics

Resources

License

Stars

Watchers

Forks

Languages