cancer-drug-synergy-prediction

Github repo for cancer drug synergy prediction work by Alexandra M. Wong and Lorin Crawford.

Introduction

Drug resistance poses a significant challenge to cancer treatment, often caused by intratumor heterogeneity. Combination therapies have shown to be an effective strategy to prevent resistant cancer cells from escaping single-drug treatments. However, discovering new drug combinations through traditional molecular assays can be costly and time-consuming. In silico approaches offer the opportunity to overcome this limitation by enabling the exploration of many candidate combinations at scale. This study systematically evaluates the effectiveness of various machine learning algorithms and drug synergy prediction tasks. Our findings challenge the assumption that multi-modal data and complex model architectures automatically yield the best predictive performance.

Installing Requirements

To install the requirements stored in requirements.txt, make sure you have a compatible python version with the needed packages. We recommend python 3.11. Then run: sh create_venv.sh

You also may need to enforce a numpy version below 2: pip install "numpy<2"

Data Downloads

Download the NCI-ALMANAC dataset with the drug combination data

Download the CellMiner data

While it is not necessary, you can also ensure the genes present in the dataset belong to the STRING protein-protein interaction network for additional biological analyses. If so, feel free to download the STRING database

Save the 9606.protein.links.detailed.v11.5.txt file
Save the 9606.protein.info.v11.5.txt file

Pre-processing the data - `preprocessing_files/`

If filtering by STRING:
- Run the main function in string_preprocessing.py. For the first time, you should use the --from_original flag
- Run the preprocessing.ipynb
Run the filter_data.ipynb
- If not using STRING to filter, modify to exclude the STRING filtration
Run the nci_almanac_therapy_classification.ipynb
- Make sure to have the manual retrieval of drug to therapy classes mapping

Generating the dataset CSV files - `dataset_creation/`

Create the morgan fingerprint only CSV files by running create_mfp_csv.ipynb
Create the -omics identifiers and PCNNGL mask CSV files by running create_omics_csv_identifiers_masks.ipynb
Create the tissue type and drug class type indices files and the MFP+Omics CSV dataset files by running create_omics_csv.ipynb

Run the Models - `models/`

Model parameter implementations are present in the models/src directory
Running training and evaluation code is present in the models/run directory

Example / Tutorial

Unzip the compressed example_data/all_cancer_256_mfp_bc0_comboscore.zip file
There is an example dataset stored in example_data directory along with an example mask file for the PCNNGL model. Run the example_models_run.ipynb jupyter notebook, which will create output files in the example_output directory. This jupyter notebook includes cases of training different parameters of all models used in the study and shows test performance for the best of the example parameter models. Note that example parameters have been chosen for ease of locally running the code on a standard personal computer to demonstrate functionality. The full dataset and training analyses will require GPU-enabled and larger memory computing clusters.

Relevant Citations

A.M. Wong and L. Crawford. Rethinking cancer drug synergy prediction: a call for standardization in machine learning applications. bioRxiv. https://doi.org/10.1101/2024.12.24.630216

Questions and Feedback

For questions or concerns with this work, please contact Alexandra M. Wong or Lorin Crawford. Feedback and questions on the software, paper, and tutorial is appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cancer-drug-synergy-prediction

Introduction

Installing Requirements

Data Downloads

Pre-processing the data - `preprocessing_files/`

Generating the dataset CSV files - `dataset_creation/`

Run the Models - `models/`

Example / Tutorial

Relevant Citations

Questions and Feedback

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_processed		data_processed
dataset_creation		dataset_creation
example_data		example_data
example_output		example_output
models		models
preprocessing_files		preprocessing_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init.py__		__init.py__
create_venv.sh		create_venv.sh
example_models_run.ipynb		example_models_run.ipynb
requirements.txt		requirements.txt

License

lcrawlab/cancer-drug-synergy-prediction

Folders and files

Latest commit

History

Repository files navigation

cancer-drug-synergy-prediction

Introduction

Installing Requirements

Data Downloads

Pre-processing the data - preprocessing_files/

Generating the dataset CSV files - dataset_creation/

Run the Models - models/

Example / Tutorial

Relevant Citations

Questions and Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pre-processing the data - `preprocessing_files/`

Generating the dataset CSV files - `dataset_creation/`

Run the Models - `models/`

Packages