Skip to content

Somewhat120/LLM_DDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-DDA

Views

Code for "Empowering graph neural network-based computational drug repositioning with large language model-inferred knowledge representation"

Workflow

Datasets

We introduce 4 drug-disease association benchmark datasets in our study, including: B-dataset, C-dataset, F-dataset, and R-dataset. Dataset summary is as follows:

Dataset Drugs Diseases Drug-disease Associations Pos-Neg Ratio
B-dataset 269 598 18,416 11.45%
C-dataset 663 409 2,532 1.57%
F-dataset 593 313 1,933 1.05%
R-dataset 894 454 2,704 0.67%

LLM templates

Please find the designed zero-shot template for GPT-4 to generate drug and disease knowledge descriptions in desc_generate.py. Please find the generated drug (drug_desc.csv) and disease descriptions (disease_desc.csv) for B-dataset, C-dataset, F-dataset, and R-dataset, in each "feat/DATASET" folder.

LLM-inferred knowledge representations

To generate LLM-inferred knowledge representations, please refer emb_generate.py. Also, we have stored generated embedding files for B-dataset, C-dataset, F-dataset, and R-dataset, please find them in "feat" folder. Specifically, LLM_drug_emb.pkl and LLM_disease_emb.pkl are generated embeddings from GPT-4; BERT_drug_emb.pkl and BERT_disease_emb.pkl are generated embeddings from BioBERT.

To reproduce our method

Environment Requirement

  • torch: 1.13.0+cu117
  • scikit-learn: 1.2.2
  • rdkit: 2023.3.3
  • dgl: 1.1.2+cu117

Cross-Validation

python main.py -sp {SAVE_PATH} -da {DATASET} -fo {NUM_FOLD} -se {SEED} -ft {LLM_EMB} -ct {MODEL_TYPE} -id {DEVICE} -ep {EPOCH} -dp {DROPOUT} -hf {HIDDEN_FEAT}

Suggested setting:

python main.py -sp {SAVE_PATH} -da {DATASET} -fo 5 -se 0 -ft LLM -ct graph_ae -id 0 -ep 5000 -dp 0.4 -hf 128

We have also stored standard prediction results of LLM-DDAGNN-AE and DirectPred baseline in "result" folder.

Citation

@article{RN93,
   author = {Gu, Yaowen and Xu, Zidu and Yang, Carl},
   title = {Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation},
   journal = {Interdisciplinary Sciences: Computational Life Sciences},
   ISSN = {1867-1462},
   DOI = {10.1007/s12539-024-00654-7},
   url = {https://doi.org/10.1007/s12539-024-00654-7},
   year = {2024},
   type = {Journal Article}
}

About

drug_disease

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages