Processing & Filtering

Much of this project was done on the cloud via Terra.bio (Firecloud v2) workflows or Google VMs. As such, if you need access to the Terra workspaces, please email twood@broadinstitute.org.

User Warning: Many of the bash scripts used to train and evaluate the models use nohup commands - if your CPU is not able to tolerate the amount of jobs, consider modifying your local version to serially run the models.

Terra Workflow Links

Main pipeline (Mutect, Mutsig2CV, Strelka, & others)

CGA HG38A WES Pipeline, split into pair sets :

NCI_A: https://app.terra.bio/#workspaces/shipp-dfci/hg38_tumorOnly_test_workspace/job_history/b0649a72-3c73-412f-b895-3d7779d43b88

NCI_B: https://app.terra.bio/#workspaces/shipp-dfci/hg38_tumorOnly_test_workspace/job_history/35b939df-447e-469f-8c62-e6829b366ab4

NCI_C: https://app.terra.bio/#workspaces/shipp-dfci/hg38_tumorOnly_test_workspace/job_history/a9d0751e-4be1-4d42-9b32-ada9637874a4

NCI_D: https://app.terra.bio/#workspaces/shipp-dfci/hg38_tumorOnly_test_workspace/job_history/8e504071-f828-4360-be1f-2aed5d3fca6d

NCI_E: https://app.terra.bio/#workspaces/shipp-dfci/hg38_tumorOnly_test_workspace/job_history/ddf231d9-5c95-48a6-afcc-673ad5b48649

Standalone BLAT filter and beyond (call-cached jobs, in order)

https://app.terra.bio/#workspaces/shipp-dfci/DLBCL_Staudt_TumorOnly_2021_v2/job_history

Classifier Preprocessing Reproducibility Steps (optional, files already included)

Remap the labels from the consensus nmf job above: src_python/remap_labels.py
Compute q-values per gene: src_python/fisher_5x2_parallel.py
Generate baseline probabilities: src_python/generate_baseline_probabilities.py
Create gene footprint table: src_python/calculate_driver_footprint.py

Before training Classifier:

Create an environment

conda create --name Classifier

conda activate Classifier

Install Pre-requisites

conda install pytorch torchvision -c pytorch

conda install pandas

conda install matplotlib

conda install scikit-learn

Model Reproducibility steps

Run model training bash scripts (warning: this will launch many jobs, do not launch at all once)
- run_all_experiments_step1.sh
- run_all_experiments_step2A.sh
- run_all_experiments_step2B.sh
- run_all_experiments_step2C.sh
- run_all_experiments_step2T.sh
- run_sens_spec_experiments.sh
Evaluate all trained models: src_python/evaluate_validation_ensembles.py
Combine training history: src_python/combine_model_training_history.py

Plotting Results

Most plots are generated via R

Step 1: src_R/plot_step1.R
Step 2A: src_R/plot_step2A.R
Step 2B: src_R/plot_step2B.R
Step 2C: src_R/plot_step2C.R
Step 2T: src_R/plot_step2T.R
Sens/Spec experiments: src_R/plot_sensitivity_specificity_experiment.R
Model training history: src_R/plot_training_history.R

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.idea		.idea
all_validation_sets		all_validation_sets
ccf_threshold_experiment		ccf_threshold_experiment
clustering_runs		clustering_runs
data_tables		data_tables
driver_frequencies		driver_frequencies
evaluation_cell_lines		evaluation_cell_lines
evaluation_lacy_cohort		evaluation_lacy_cohort
evaluation_longitudinals		evaluation_longitudinals
evaluation_panel_sets		evaluation_panel_sets
evaluation_test_set		evaluation_test_set
evaluation_validation_set		evaluation_validation_set
model_training_history		model_training_history
plots		plots
portal_config_jsons		portal_config_jsons
portal_icomut_jsons		portal_icomut_jsons
random_add_in_experiment		random_add_in_experiment
random_dropout_experiment		random_dropout_experiment
reduce_purity_experiment		reduce_purity_experiment
saved_models		saved_models
signatureAnalyzer		signatureAnalyzer
src_R		src_R
src_python		src_python
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
DLBCL-Classifier.Rproj		DLBCL-Classifier.Rproj
README.md		README.md
chapuy_et_al_dlbcl.pdf		chapuy_et_al_dlbcl.pdf
nohup.out		nohup.out
plot_cluster_stats.R		plot_cluster_stats.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing & Filtering

Terra Workflow Links

Main pipeline (Mutect, Mutsig2CV, Strelka, & others)

CGA HG38A WES Pipeline, split into pair sets :

Standalone BLAT filter and beyond (call-cached jobs, in order)

Classifier Preprocessing Reproducibility Steps (optional, files already included)

Before training Classifier:

Create an environment

Install Pre-requisites

Model Reproducibility steps

Plotting Results

About

Releases 1

Packages

Contributors 2

Languages

getzlab/DLBCL-Classifier

Folders and files

Latest commit

History

Repository files navigation

Processing & Filtering

Terra Workflow Links

Main pipeline (Mutect, Mutsig2CV, Strelka, & others)

CGA HG38A WES Pipeline, split into pair sets :

Standalone BLAT filter and beyond (call-cached jobs, in order)

Classifier Preprocessing Reproducibility Steps (optional, files already included)

Before training Classifier:

Create an environment

Install Pre-requisites

Model Reproducibility steps

Plotting Results

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages