project solubility-predictor (in progress)
according to article "Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models withVaried Molecular Representations" by Panapitiya et al, 2021 (https://arxiv.org/pdf/2105.12638v1.pdf), Molecular Descriptor Model overperforms fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet.
Training Dataset: ESOL Delaney, Water solubility data (LogS, log solubility in mols per litre) for common organic small molecules.
"Input/solubility_delaney_processed.csv" training dataset is obtained from https://moleculenet.org/datasets-1 ,
"Input/solubility_delaney.csv" original data Delaney "ESOL: Estimating Aqueous Solubility Directly from Molecular Structure" J. Chem. Inf. Comput. Sci. 2004, 44, 3, 1000–1005)
"Input/unknown.smi" UNKNOWN dataset: 105 organic molecules from ChEMBL (SMILES and ChEMBL names) with unknown water solubility, is used to predict solubility from structure
"Output/ * . * " Files generated by script
"Output/unknown_solubility.csv" Results: Predicted solubilities for UNKNOWN dataset