This repository contains all code, data, instructions and model weights necessary to run or to retrain a model.
Accurate in silico predictions of protein-ligand binding affinity can significantly accelerate the early stages of drug discovery. Deep learning-based methods have shown promise recently, but their robustness for the virtual screening of large compound libraries on various targets needs improvement. Understanding what these models learn from input protein and ligand data is essential to addressing this problem. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on commonly used kinase datasets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. By introducing perturbations to the ligand graph representation of the SMILES string, we assess the role played by ligand encodings given by the graph neural network. Our investigations show that protein encodings with structural information do not significantly impact the binding predictions, and the deep learning model relies heavily on ligand encodings for accurately predicting the binding affinity. Furthermore, various methods to combine protein and ligand encodings are explored, which showed no significant change in performance.
We will set up the environment using Anaconda. Clone the current repo
git clone https://github.com/meyresearch/Protein_ContactMaps_DL_BindingAffinity.git
This is an example for how to set up a working conda environment to run the code (but make sure to use the correct pytorch, pytorch-geometric, cuda versions or cpu only versions):
conda create --name mldd --file mldd.txt
conda activate mldd
pip install tensorflow-gpu
The files in data
contain the datasets used in the study and the trained models.
If you want to train or test the models with the data
used in the study then:
- download it from here
- unzip the directory and place it into
data
such that you have the pathdata/
in the main folder.
- Unzip the
data
folder - Run the
Training_notebook.ipynb
file for training the DL method using various protein and ligand encodings.
If you use this repository or the models in your work, please cite the following paper:
@article{gorantla2023proteins,
title={From proteins to ligands: decoding deep learning methods for binding affinity prediction},
author={Gorantla, Rohan and Kubincova, Alzbeta and Wei{\ss}e, Andrea Y and Mey, Antonia SJS},
journal={Journal of Chemical Information and Modeling},
volume={64},
number={7},
pages={2496--2507},
year={2023},
publisher={ACS Publications}
}