3/1/2021
Code used for the analysis of the paper:
Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19.
Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, JJ Patten, Robert Davey, Joseph Loscalzo, and Albert-László Barabási.
- Code:
00_Intro.R
- Description: Identify the Largest Connected Component (LCC, Disease Module), calculates its significance, plot the network and the distribution of the random (Figure 2).
- Protein-Protein Interaction (PPI):
DatasetS2.csv
- SARS-CoV-2 Targets:
DatasetS1.csv
- Code:
01_Disease_Separation.ipynb
- Description: Calculation of the network separation
$S_{AB}$ between diseases and the COVID-19 module (data presented in Figures S1 and S2).
- Protein-Protein Interaction (PPI):
DatasetS2.csv
- SARS-CoV-2 Targets:
DatasetS1.csv
- Disease-Associated Genes:
Guney2016_GenesDisease.tsv
- Code:
02_DR_AI.py
- Description: Calculates the Drug Repurpusing Candidates for pipelines A1-A4.
- Codes:
03_1_DR_Diff_matrix_generator.py
and03_2_DR_Diff_run_pipelines.py
- Description: Calculates the Drug Repurpusing Candidates for pipelines D1-D5.
- Protein-protein interaction network: The graph needs to be a single connected component. Nodes are entrez IDs and in the format of an integer. File:
DatasetS2.csv
- Seeds: Disease-associated genes in the format of gene symbols. Each gene must be in one row similar to the sample file located at
DatasetS1.csv
- drug2targets: A dictionary of drugs (keys) and the targets (values). The targets are shown with Entrez IDs and are in the format of an integer. A sample file is located at
DatasetS3.csv
- app_drugs are similar to above. The app_drugs are the known approved drugs for the disease of study. They will be used as true positives to calculate the performance of the models. If there are no app_drugs, an empty dictionary {} should be fed into the code.
- Before running the code, the user must generate the pre-calculated matrices, DSD, JS, and KL. To do so,
03_1_DR_Diff_matrix_generator.py
must be run.
03_1_DR_Diff_matrix_generator.py
takes as inputs the network and will generate the similarity matrices and the corresponding node to index conversion dictionary and dumps the results underoutput/diffusion
. They will be used later during the ranking process.
- Once the pre-calculated matrices are generated,
03_2_DR_Diff_run_pipelines.py
should be run. Results will be printed underoutput/diffusion/out
.
It is a .csv
file with the ranking values of each model and the drug score. There is also a ranking value (integer) that corresponds to each ranking score. The file contains the following columns.
- drug ID: the keys of the drug2targets dictionary
- APP-Drugs: a binary value indicating whether or not the drug is approved. If app_drugs given by the user is empty, all binary values will be marked 'no'.
- Degree: the number of targets
- DSD-min-Rank: an integer showing drug rank by DSD-min; the lower, the better
- KL-med-Rank: an integer showing drug rank by KL-med; the lower, the better\
- KL-min-Rank: an integer showing drug rank by KL-min; the lower, the better\
- JS-med-Rank: an integer showing drug rank by JS-med; the lower, the better\
- JS-min-Rank: an integer showing drug rank by JS-min; the lower, the better\
- DSD-min: the score given by DSD-min to each drug; the lower, the better\
- KL-med: the score given by KL-med to each drug; the lower, the better\
- KL-min: the score given by KL-min to each drug; the lower, the better\
- JS-med: the score given by JS-med to each drug; the lower, the better\
- JS-min: the score given by JS-min to each drug; the lower, the better
-
Code:
04_DR_Proximity.py
-
Description: Calculates the Drug Repurpusing Candidates for pipelines P1-P3.
- disease gene list: a list of disease genes in entrez ID
- drug target file: a tab-delimited file with first entry being the drug, and the following entries being entrez IDs of targets
- interactome file: tab-delimited node pairs in their entrez ID. The 3rd column in the sample interactome is not being used
- -nsims specifies the number of randomizations in the proximity algorithm
> python 04_DR_Proximity.py -i_disease data/DatasetS1.csv -i_drug data/drug_targets_test.txt -i_interactome data/DatasetS2.csv -nsims 1000
- Sample Output display: (Note the beginning (first four lines) of this output are imported from previous code and are not relevant, please pay no attention to them).
>>>
GenRev not found, steiner wont work
Import error: Negex. Using keyword matching instead
Import error: Funcassociate. Make sure that funcassociate is in toolbox!
DIAMOnD not found and thus will not be available!
Done parsing 332 disease genes
> Done parsing drug targets: read 1 total drugs
> done loading network:
> network contains 18505 nodes and 327924 links
number of samples: 1
drug=DBxxxxx, disease=DatasetS1.csv
Finished in 0.019384 hours
Current time:
18:51:56
The output of the code are .txt files in the output/proximity
folder. If the `output/proximity
folder is not present, it will automatically be created. Each .txt
file is named with drug-disease input pair, e.g. <Drug_ID>_<Disease_genes_input_filename>.txt
in the toy example. The output file consists of six tab-delimited entries. The entries are (respectively):
(1) drug, (2) disease file, (3) proximity distance, d, (4) proximity Z-score, z, (5) mean of proximity distance d in random simulation, (6) standard deviation of proximity distance d in random simulation
-
In the example
drug_targets_test.txt
file, we provide a made-up toy example drug for testing and demonstrating the code. The drug target data for each pipeline can be found indrug_targets_P1.txt
drug_targets_P2.txt
, andDataset_S4.txt
. -
computations with many drug-disease pairs are heavy. Parallel computing is recommended. This code has implementation of parallel computing, but it is not used in the demo above. Please check out the source code in
04_DR_Proximity.py
. -
The randomization of proximity algorithm is, by its mathematical nature, stochastic, i.e. re-running it does NOT generate exactly the same result for its Z-Score. Details can be found in ref 1. Guney(2016) paper.
The code here uses the code published in:
-
Guney, E., Menche, J., Vidal, M. et al. Network-based in silico drug efficacy screening. Nat Commun 7, 10331 (2016). https://doi.org/10.1038/ncomms10331
-
Menche J, Sharma A, Kitsak M, et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347(6224):1257601. doi:10.1126/science.1257601
After the proximity code is run, you have to combine all txt
files, they can be easily combined by cat *.txt >> output.txt
.