This resporitory contains all the codes and datasets required to replicate the experiments and analyses in the article "Geometric deep learning for antibiotic discovery".
The code for this project was written in JupyterNotebook
Installation: https://jupyter.org/install
Numpy: https://numpy.org/install/
Pandas: https://pandas.pydata.org/docs/getting_started/install.html
Pytorch: https://pytorch.org/get-started/locally/
Pytorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
Scikit-Learn: https://scikit-learn.org/stable/install.html
Chemprop: https://chemprop.readthedocs.io/en/latest/installation.html
RDKit: https://www.rdkit.org/docs/Install.html
Consisted of 2335 molecules represented as thier SMILES notation. Activity column in this dataset is to indicate presence (1) or absence (0) of the molecular property that inhibit against the growth of E. coli.
It contains the pretrained word embeddings for tokens of the SMILES strings of the molecules in our dataset. This file will be used in the ablation study experiment.
A folder contains 5 splits (or folds) of training and test sets for 5 fold cross validation. These 5 folds of data will be used in the model evaluations for the our proposed model and the competing GNN models.
To replicate the experiment for section 4.1 in the articles, we will need
to obtain the performance metrics for repective models.
A study in section 4.2 of the article to investigate the contribution of Morgan Fingerprint, molecular graph ebeddings and SMIELS text embeddings towards the molecular property of interest.
To investigate the effect of ratio of classes in the dataset on the performance of our proposed model.
To investigate the effect of FNN architecture on the performance of our proposed model.