This GitHub repository contains code for a computational drug discovery project utilizing neural networks. The goal of the project is to explore bioactivity data for a specific target protein related to Herpes virus and use it to classify compounds as active, inactive, or intermediate based on their bioactivity values. Additionally, molecular descriptors are calculated to aid in the analysis.
- Data Collection
- Handling Missing Data
- Data Preprocessing
- Calculate Lipinski's Descriptors
- Convert IC50 to pIC50
- Exploratory Data Analysis
- Statistical Analysis
- Descriptor Calculation and Dataset Preparation
The data is collected from the ChEMBL database using the ChEMBL web service package. The target protein for Herpes virus is searched, and the bioactivity data is retrieved.
Any compounds with missing values for the standard_value column are dropped from the dataset.
The bioactivity data is preprocessed, and compounds are labeled as active, inactive, or intermediate based on their IC50 values.
Lipinski's descriptors are calculated for the compounds, which are essential molecular properties used in drug discovery and medicinal chemistry.
IC50 values are converted to pIC50, a negative logarithmic scale, to ensure uniform distribution and facilitate analysis.
Frequency plots, scatter plots, and box plots are used to explore the distribution of bioactivity classes and molecular properties.
The Mann-Whitney U test is performed to assess whether there is a significant difference between the distributions of active and inactive compounds for various molecular properties.
PaDEL-Descriptor software is used to calculate molecular descriptors and prepare the dataset for further analysis.
Please feel free to explore the code and datasets in this repository to understand the drug discovery process for the target protein related to Herpes virus. If you have any questions or suggestions, feel free to open an issue or contribute to the project. Happy drug discovery!