From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction

This repository contains all code, data, instructions and model weights necessary to run or to retrain a model.

Project Abstract

Accurate in silico predictions of protein-ligand binding affinity can significantly accelerate the early stages of drug discovery. Deep learning-based methods have shown promise recently, but their robustness for the virtual screening of large compound libraries on various targets needs improvement. Understanding what these models learn from input protein and ligand data is essential to addressing this problem. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on commonly used kinase datasets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. By introducing perturbations to the ligand graph representation of the SMILES string, we assess the role played by ligand encodings given by the graph neural network. Our investigations show that protein encodings with structural information do not significantly impact the binding predictions, and the deep learning model relies heavily on ligand encodings for accurately predicting the binding affinity. Furthermore, various methods to combine protein and ligand encodings are explored, which showed no significant change in performance.

Setup environment

We will set up the environment using Anaconda. Clone the current repo

git clone https://github.com/meyresearch/Protein_ContactMaps_DL_BindingAffinity.git

This is an example for how to set up a working conda environment to run the code (but make sure to use the correct pytorch, pytorch-geometric, cuda versions or cpu only versions):

conda create --name mldd --file mldd.txt
conda activate mldd
pip install tensorflow-gpu

Dataset

The files in data contain the datasets used in the study and the trained models.

If you want to train or test the models with the data used in the study then:

download it from here
unzip the directory and place it into data such that you have the path data/ in the main folder.

Steps for training

Unzip the data folder
Run the Training_notebook.ipynb file for training the DL method using various protein and ligand encodings.

Citation

If you use this repository or the models in your work, please cite the following paper:

@article{gorantla2023proteins,
  title={From proteins to ligands: decoding deep learning methods for binding affinity prediction},
  author={Gorantla, Rohan and Kubincova, Alzbeta and Wei{\ss}e, Andrea Y and Mey, Antonia SJS},
  journal={Journal of Chemical Information and Modeling},
  volume={64},
  number={7},
  pages={2496--2507},
  year={2023},
  publisher={ACS Publications}
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
Figures		Figures
misc		misc
models_sample		models_sample
.gitignore		.gitignore
Data_processing.ipynb		Data_processing.ipynb
LICENSE		LICENSE
README.md		README.md
Testing_notebook_1D.ipynb		Testing_notebook_1D.ipynb
Testing_notebook_2D.ipynb		Testing_notebook_2D.ipynb
Training_notebook_1D.ipynb		Training_notebook_1D.ipynb
Training_notebook_2D.ipynb		Training_notebook_2D.ipynb
data.py		data.py
data_1d.py		data_1d.py
dnn.py		dnn.py
emetrics.py		emetrics.py
mldd.txt		mldd.txt
scripts.py		scripts.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction

Project Abstract

Setup environment

Dataset

Steps for training

Citation

About

Releases

Packages

Contributors 2

Languages

License

meyresearch/DL_protein_ligand_affinity

Folders and files

Latest commit

History

Repository files navigation

From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction

Project Abstract

Setup environment

Dataset

Steps for training

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages