The BERT-based embedding model for SMILES molecule representation from the paper "Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction" written by Shin et al 2019. These sources are pytorch-implemented codes.
awk '{OFS="\t"; FS="\t"; print $2}' CID-SMILES > CID-SMILES.txt
python -i CID-SMILES.txt -o CID-SMILES_train.txt -v vocab.voc
python -i CID-SMILES_train.txt -e 5 --lossWeight none -v vocab.voc
usage: [-h] [-i INPUT] [-e EPOCHS] [--lossWeight {none,log,sqrt,raw}]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input training SMILES file.
-e EPOCHS, --epochs EPOCHS
--lossWeight {none,log,sqrt,raw}
The type of class weights for the cross-entropy loss.
- Epochs 1, without loss weights: 0.9471
- Epochs 3, without loss weights: 0.9554