-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charges are missing #42
Comments
Yeah, I agree with @peastman that we have to recompute from scratch. Although, MBIS charges are available on almost all SPICE sets (except the DES370K supplement). Even the psi4 stdout on the QCA records don't have Mulliken charges printed out since they're calculated post SCF (afaik) and we didn't specify in our inputs to calculate those. |
Let's go back to the main issue: how to get the molecular and optionally partial charges. I want to make the SPICE loader in TorchMD-NET (https://github.com/torchmd/torchmd-net/blob/main/torchmdnet/datasets/spice.py) to be able load them. |
Having the downloader create a mol = Chem.MolFromSmiles(smiles)
charges = [atom.GetFormalCharge() for atom in mol.GetAtoms()] If you want MBIS partial charges, you can already store those by including the |
@bennybp: Any chance you have thought about how to represent this information in a common way in your HDF5 files built for machine learning? |
I just wanted to point you to this PR that shows how to identify molecules that have changed connectivity as an alternative to using charges to filter "broken" molecules. |
I would like to extend the SPICE dataset, and am trying to reproduce some of the QM calculations to ensure my DFT settings are correct. I would really support adding to the dataset the total charge and spin multiplicity of the molecules at the very least to improve reproducibility of the DFT calculations. |
[davkovacs], you can easily extract this information from smiles using rdkit: FOR MULTIPLICITY: def GetSpinMultiplicity(Mol, CheckMolProp = True):
FOR CHARGE: charge = Chem.GetFormalCharge(molecule) |
The dataset file (https://github.com/openmm/spice-dataset/releases/download/1.0/SPICE.hdf5) doesn't contain the total molecular charge. This could be extracted parsing the SMILES, but it is inconvenient and adds additional burden on the users.
The dataset should provide the complete QM description of a molecule (i.e. elements, positions, charge, and spin state) in a convenient form. The downloader should be modified to add a field with the total charge (and maybe formal charges) for each molecule.
Also, I would suggest including Mulliken charges (Psi4 computes them by default). They could be used to filter "broken" molecules. From my recent experience, the large forces aren't enough to catch them all.
The text was updated successfully, but these errors were encountered: