Skip to content

Latest commit

 

History

History
21 lines (11 loc) · 1.36 KB

README.md

File metadata and controls

21 lines (11 loc) · 1.36 KB

project solubility-predictor (in progress)

Predicting Aqueous Solubility of Organic Molecules Using Molecular Descriptor Model

ML model: Random Forest Regressor

ML features: physico-chemical descriptors obtained with PaDEL software

according to article "Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models withVaried Molecular Representations" by Panapitiya et al, 2021 (https://arxiv.org/pdf/2105.12638v1.pdf), Molecular Descriptor Model overperforms fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet.

Training Dataset: ESOL Delaney, Water solubility data (LogS, log solubility in mols per litre) for common organic small molecules.

"Input/solubility_delaney_processed.csv" training dataset is obtained from https://moleculenet.org/datasets-1 ,

"Input/solubility_delaney.csv" original data Delaney "ESOL:  Estimating Aqueous Solubility Directly from Molecular Structure" J. Chem. Inf. Comput. Sci. 2004, 44, 3, 1000–1005)

"Input/unknown.smi" UNKNOWN dataset: 105 organic molecules from ChEMBL (SMILES and ChEMBL names) with unknown water solubility, is used to predict solubility from structure

"Output/ * . * " Files generated by script

"Output/unknown_solubility.csv" Results: Predicted solubilities for UNKNOWN dataset