This repository is the official implementation of Energy-Inspired Molecular Conformation Optimization (ICLR 2022). [PDF]
The code has been tested in the following environment:
Package | Version |
---|---|
Python | 3.7.11 |
PyTorch | 1.9.0 |
CUDA | 11.1 |
PyTorch Geometric | 2.0.3 |
DGL | 0.6.1 |
RDKit | 2020.09.1 |
conda create -n confopt python=3.7
conda install rdkit -c rdkit
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install dgl-cu111 tensorboard easydict ase
pip install lie_learn # for SE(3)-Transformer baseline
You may need to install pytorch, pytorch-geometric and dgl by yourself to make it compatible with your CUDA version.
The processed data and the pre-trained models can be found in this folder. You can also download them via the following command:
# for dataset (10.79 GB)
wget -O data.zip 'https://www.dropbox.com/sh/bcidyj2mbgy5dp2/AAB_lXSjadWI1wUk6WZgLEBGa?dl=1'
unzip data.zip -d data
# for pre-trained models (1.49 GB)
wget -O pretrained_models.zip 'https://www.dropbox.com/sh/cq6ho0imyynkfpg/AACq0GW_auRdLXAIQicnG56wa?dl=1'
unzip pretrained_models.zip -d pretrained_models
If you meet difficulties in downloading the data / pretrained models, please try this Google Drive link.
To train a conformer optimization model:
python train_conf.py --config configs/{qm9, drug}_default.yml --model_type {equi_se3trans, egnn, ours_o2, ours_o3}
To train a conformer generation model:
- with random initial conformers:
python train_sampling.py --config configs/{qm9, drug}_sampling.yml --model_type ours_o2 --propose_net_type random --noise_std 0.028 --eval_propose_net_type random --eval_noise 0.028
- with RDKit initial conformers:
python train_sampling.py --config configs/{qm9, drug}_sampling.yml --model_type ours_o2 --propose_net_type gt --noise_std {0.5, 1.0} --eval_propose_net_type online_rdkit --eval_noise 0.
To train a property prediction model:
python train_prop.py --config configs/qm9_prop_default.yml --model_type ours_o2 --pos_type {gt, rdkit, ours} --target_name homo
To evaluate the conformer optimization model:
python eval_conf.py --ckpt_path <model_path> --test_dataset <dataset_path>
For example, with data and pretrained model prepared (see above), you can run the following command to evaluate our two-atom model on the QM9 dataset:
python eval_conf.py --ckpt_path pretrained_models/conf_opt/qm9_our_o2 --test_dataset data/qm9/qm9_test.pkl
One can also dump optimized conformers and reproduce the results with error bars reported in the paper with the following command. Dumped conformers can also be used in downstream tasks like molecular property prediction.
python dump_confs.py \
--test_dataset data/qm9/qm9_test.pkl \
--ckpt_path_list pretrained_models/conf_opt/qm9_our_o2 \
--dump_dir dump_confopt_results \
--filter_pos False --rdkit_pos_mode all
To evaluate the conformer generation model:
python eval_sampling.py --ckpt_path <model_path> --eval_propose_net_type random --eval_noise 0.028
or:
python eval_sampling.py --ckpt_path <model_path> --eval_propose_net_type online_rdkit --eval_noise 0.
To evaluate the property prediction model:
python eval_prop.py --ckpt_path pretrained_models/prop_pred_with_gt/qm9_homo
The conformation optimization performance of baseline models and our models on the QM9 and GEOM-Drugs datasets:
Model name | QM9 mean RMSD | QM9 median RMSD | Drugs mean RMSD | Drugs median RMSD |
---|---|---|---|---|
RDKit+MMFF | 0.3872 | 0.2756 | 1.7913 | 1.6433 |
SE(3)-Trans. | 0.2471 | 0.1661 | - | - |
EGNN | 0.2104 | 0.1361 | 1.0394 | 0.9604 |
Ours-TwoAtom | 0.1404 | 0.0535 | 0.8815 | 0.7745 |
Ours-Ext_v | 0.1385 | 0.0506 | 0.8699 | 0.7554 |
Ours-ThreeAtom | 0.1374 | 0.0521 | 0.8579 | 0.7240 |
The conformation generation performance of our models on the GEOM-QM9 and GEOM-Drugs datasets:
Model name | mean COV | median COV | mean MIS | median MIS | mean MAT | median MAT |
---|---|---|---|---|---|---|
GEOM-QM9 Ours-Random | 87.10 | 92.62 | 30.21 | 30.74 | 0.3816 | 0.3843 |
GEOM-QM9 Ours-RDKit | 86.54 | 90.33 | 5.44 | 0.00 | 0.2686 | 0.2223 |
GEOM-Drugs Ours-Random | 76.50 | 83.78 | 31.40 | 23.03 | 1.0694 | 1.0583 |
GEOM-Drugs Ours-RDKit | 68.07 | 73.46 | 21.08 | 1.88 | 1.0429 | 0.9736 |
@inproceedings{guan2022energy,
title={Energy-Inspired Molecular Conformation Optimization},
author={Guan, Jiaqi and Qian, Wesley Wei and Liu, Qiang and Ma, Wei-Ying and Ma, Jianzhu and Peng, Jian},
booktitle={International Conference on Learning Representations},
year={2022}
}