Releases: openmm/spice-dataset
SPICE 2.0.1
This is a minor update. It removes about 500 conformations from the PubChem Boron SIlicon subset in which bonds broke during conformation generation (#99).
The dataset can be downloaded from
SPICE 2.0.0
This is a major update that roughly doubles the total amount of data. It particularly focuses on increasing the amount of chemical diversity and improving sampling of nonbonded interactions. It contains the following additions.
- #71 and #90: Over 13,000 new PubChem molecules. Among them are about 1500 containing boron and 1900 containing silicon, two elements that were not included in version 1.
- #72: Over 194,000 conformations for dimers consisting of an amino acid and a ligand, giving improved sampling of protein-ligand interactions.
- #70: 1000 water clusters to provide sampling of interactions in bulk water.
- #78: 1397 PubChem molecules solvated with a shell of water molecules
- #91: Reran bad calculation from SPICE 1. A small fraction of calculations in the original version were not properly converged due to a bug in Psi4, leading to very large forces. In the previous release they had to be excluded by applying a filter based on the force magnitude. These calculations have been rerun with Psi4 1.8.2.
The dataset can be downloaded from
SPICE 1.1.4
This release fixes an issue in the downloader script that caused bond orders to be omitted for some molecules.
SPICE 1.1.3
There are no code changes in this release. The HDF5 file was regenerated with the force filter enabled. This is needed to exclude some bad datapoints that were produced by a version of Psi4 containing a bug. A future release will fix those datapoints by regenerating them with a newer version of Psi4.
SPICE 1.1.2
This minor update modifies the downloader script so that energies are saved in double precision. This has minimal effect on the formation_energy
field, but it significantly improves the accuracy of the dft_total_energy
The dataset can be downloaded from
SPICE 1.1.1
This update fixes a single bug in the downloader script. When multiple levels of theory are available, it ensures the default one is always chosen. This is necessary since a second less accurate level of theory was just added (see #39).
The full dataset is available at The attached HDF5 file is a reduced version containing only the most commonly used data fields (forces and energies), and where conformations with very large forces (>1 hartree/bohr) have been removed.
This update adds minor improvements to the downloader script: filtering of samples by force magnitude, and annotating units in the HDF5 file. The attached SPICE.hdf5 file uses the default cutoff on forces of 1 hartree/bohr.
This is the initial release of the SPICE dataset. It contains over 1.1 million conformations for a variety of drug-like molecules, amino acids, dipeptides, and dimers composed of small molecules.
The attached HDF5 file contains the most commonly used data fields: conformations, energies, and forces. For a description of the file format and instructions on how to download additional data fields, see