This repository includes scripts to retrain and generate espaloma-0.3.0
forcefield.
espaloma-0.3.0
force field is a Class I
force field where the valence parameters are assigned and optimized via machine learning framework.
This repository is part of espaloma-0.3.0-manuscript.
Note that there is a refactored repository to train espaloma (espfit) and an example workspace (espfit_workspace) to use espfit. However, please be aware that these repositories are still under development.
We first convert the HDF5 files obtained from download-qca-dataset to DGL graphs. Here, we compute the AM1BCC-ELF10 partial charges using OpenEye toolkit as a reference.
Molecules with a gap between minimum and maximum energy larger than 0.1 Hartree (62.5 kcal/mol) are excluded from the dataset prior to the refitting experiment,
similar to the original paper of Wang et al..
Since the van der Waals parameters affect the physical property prediction, which is computationally challenging to optimize,
we focus on optimizing the valence parameters and use openff-2.0.0
force field
(details can be found here) for the van der Waals paremeters.
Espaloma was trained to minimize the quantum mechanics energies and forces, and also applied L2 regularization to improper and proper torsion force contants. The electronegativity and hardness of atoms were predicted to determine the atomic partial charges, following the same protocol described in the original paper by Wang et al., which used the AM1BCC-ELF10 partial charges as a reference.
openff-default/
01-create-dataset/
- Convert HDF5 files into DGL graphsscript/
- Stores scripts to convert HDF5 files into DGL graphsDataset/
- Collection of Dataset from QCArchivespice-des-monomers/
spice-dipeptide/
spice-pubchem/
rna-diverse/
rna-trincleotide/
rna-nucleoside/
OptimizationDataset/
- Collection of OptimizationDataset from QCArchivegen2/
pepconf-dlc/
TorsionDriveDataset/
- Collection of TorsionDriveDataset from QCArchivegen2-torsion/
protein-torsion/
02-train/
- Refit and evaluate espalomabaseline/
- Scripts used to calculate baseline energies and forces using other forcefieldsjoint-improper-charge/charge-weight-1.0/
- Scripts used to train and evaluate espalomamerge-data/
- Scripts used to preprocess dgl graphs prior to training
envs/
- Stores conda environment filesenvironment-create-dataset.yaml
- Conda environment used to convert HDF5 files into DGL graphs in01-create-dataset/
environment-refit.yaml
- Conda environment to train and evaluate espaloma in02-train/
Please refer here to find more details about the actual origin of the dataset described above.
Espaloma ver. 0.3.0 was used to create the DGL graphs in 01-create-dataset/
.
Note that version 0.3.0 is no longer compatible with the 0.2.x models, and vice versa.
A fixed version of 0.3.0 (commit hash:4c6155b72d00ce0190b3cb551e7e59f0adc33a56)
was used for the refitting experinment and model evaluation which allows improper torsions to be fit to n=1,2 phase multiplicity.
For a quick start, the preprocessed data in openff-default/02-train/merge-data/
is available here on Zenodo for training espaloma-0.3.0
.
If you find this helpful please cite the following:
@misc{takaba2023machinelearned,
title={Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond},
author={Kenichiro Takaba, Iván Pulido, Pavan Kumar Behara, Chapin E. Cavender, Anika J. Friedman, Michael M. Henry, Hugo MacDermott Opeskin, Christopher R. Iacovella, Arnav M. Nagle, Alexander Matthew Payne, Michael R. Shirts, David L. Mobley, John D. Chodera, Yuanqing Wang},
year={2023},
eprint={2307.07085},
archivePrefix={arXiv},
primaryClass={physics.chem-ph}
}