Aromaticity of Imidizoliums #2

benjamindjensen · 2023-01-06T21:20:40Z

benjamindjensen
Jan 6, 2023

Describe the bug
For a imidazolium molecule I was expecting a molecule with symmetrical atom/bond/... types based on the molecule resonance. However OpenFF perception is assigning different types based on the location of the charged nitrogen in the inputed structure. Maybe this is expected behavior for OpenFF based on my understanding of #1258, but would appreciate confirmation if possible.

To Reproduce

from openff.toolkit import Molecule
from openff.toolkit import ForceField
from openff.units import unit
from openff.interchange import Interchange
import numpy as np
imidazolium = Molecule.from_smiles("CCN1C=C[N+](=C1)CC")
imidazolium.generate_conformers()
sage = ForceField("openff-2.0.0.offxml")
cubic_box = (
    np.array(
        [
            [10.0, 0.0, 0.0],
            [0.0, 10.0, 0.0],
            [0.0, 0.0, 10.0],
        ]
    )
    * unit.angstrom
)
interchange = Interchange.from_smirnoff(topology=[imidazolium], force_field=sage, box=cubic_box)
interchange.handlers['vdW'].slot_map

Output
The hydrogen atoms near the N+ atom are labeled with [#1:1]-[#6X4]~[*+1,*+2], while the hydrogen atoms near the other nitrogen are labeled with [#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35].

interchange.handlers['vdW'].slot_map
...
TopologyKey with atom indices (12,): PotentialKey associated with handler 'vdW' with id '[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]',
TopologyKey with atom indices (13,): PotentialKey associated with handler 'vdW' with id '[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]',
...
TopologyKey with atom indices (17,): PotentialKey associated with handler 'vdW' with id '[#1:1]-[#6X4]~[*+1,*+2]',
TopologyKey with atom indices (18,): PotentialKey associated with handler 'vdW' with id '[#1:1]-[#6X4]~[*+1,*+2]',
...

Similar behavior for bonds/angle/etc as well.

Computing environment (please complete the following information):

openff.toolkit.__version__
'0.11.0'

openff.interchange.__version__
'v0.2.1'

mattwthompson · 2023-01-06T22:15:45Z

mattwthompson
Jan 6, 2023
Maintainer

I can reproduce this (thanks for the excellent write-up) and agree that one would expect these parameters to be symmetric. Normally slight differences in parametrizing aren't impactful, but it looks like these parameters have significantly different values. It also strikes me that parameter n6 has values that are so nicely rounded to just a few significant figures.

If I understand the background here, which is far from guaranteed, one of the confusing things about this case is that the MDL aromaticity model doesn't recognize five-membered rings as aromatic, even though some (most? all?) are in practice. OpenFF doesn't currently have its own aromaticity model, deferring to RDKit ¹ and OpenEye ² to handle it, although only MDL is implemented now.

In [1]: from openff.toolkit.utils.toolkits import *

In [2]: from openff.toolkit import *

In [3]: {atom.is_aromatic for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC", toolkit_registry=RDKitToolkitWrapper()).atoms}
Out[3]: {False}

In [4]: {atom.is_aromatic for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC", toolkit_registry=OpenEyeToolkitWrapper()).atoms}
Out[4]: {False}

But that's not necessarily the root of the issue, since the SMIRKS patterns are hung up on the existence of the partial charge, not aromaticity. The parameter with the SMIRKS pattern [#1:1]-[#6X4]~[*+1,*+2] is listed later in the force field the other one and is given higher priority in SMIRKS matching, and because the charge is all on one nitrogen, the matching is asymmetric. The Molecule only stores the partial charge on one nitrogen, even though it's delocalized.

In [1]: from openff.toolkit import *

In [2]: [(atom.atomic_number, atom.formal_charge.m) for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC").atoms]
Out[2]:
[(6, 0),
 (6, 0),
 (7, 0),
 (6, 0),
 (6, 0),
 (7, 1),
 (6, 0),
 (6, 0),
 (6, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0),
 (1, 0)]

I'm actually not sure how a delocalized charge like this could be represented on a single molecule with MDL aromaticity - @j-wags would know better whether it's actually impossible.

0 replies

j-wags · 2023-01-06T23:28:32Z

j-wags
Jan 6, 2023
Maintainer

Thanks for the great writeup @benjamindjensen!

Yeah... this is an undesired behavior, but it would be tricky to fix because it walks the jurisdictional line between the infrastructure and the data.

In the infrastructure, formal charges are currently strictly integer. To allow anything else would be really complicated and make interoperability with the rest of the cheminformatics world really hard.

In the data (force fields), parameter SMARTS are free to use formal charge and aromaticity in matching. In this case, the parameter tries to distinguish between chemistries using formal charge, but the fact that it's delocalized means that we probably need to use aromaticity to distinguish (and even then, the MDL aromaticity model in particular wouldn't have caught this anyway)

So, the issue is valid and I think it's slightly distinct from #1258, it'll just be a lot of effort to fix. We should leave this open and tag with high effort and low priority unless we see if causing big problems in conformer/free energy accuracy.

0 replies

benjamindjensen · 2023-01-07T18:07:46Z

benjamindjensen
Jan 7, 2023
Author

@mattwthompson and @j-wags - Thank you for the detailed and quick feedback on the issue. It does sound like a complex situation. Appreciated the clarification!

0 replies

davidlmobley · 2023-01-07T18:40:45Z

davidlmobley
Jan 7, 2023
Maintainer

My take is that this is ... not exactly a bug, but more of a "desired improvement in the force field".

I don't necessarily think this is even that tied to the aromaticiity model. At present, we have to use SOME aromaticity model, and the MDL one is the best fit. Regardless, any aromaticity model is not perfect and has edge cases. But even if aromaticity was handled perfectly here, I'm not sure that would fix the problem.

Here's what I think the issue is: it's essentially a tautomer/resonance issue like, say, carboxylates or nitros or similar. For these smaller groups like nitros, we have substructure searches built in that basically detect these and say, "ah, here you have something you've input as asymmetric (as you should) but the parameters really ought to be symmetric". That's fine for very small groups like nitros or carboxylates, but as the size of the functional group gets bigger it's harder to think of all the possible cases one would want to consider and catch them in the force field.

I THINK that's exactly what's going on here. Naturally, when putting in the molecule, you have to localize the positive charge SOMEWHERE even though it's going to be delocalized across the ring. And the hope, of course, is that the toolkit/force field would realize this and symmetrize everything. However, I think there's only a couple ways to do that, and none are easy:

Have the force field itself symmetrize such cases (as is done for nitros/carboxylates)
Have the toolkit realize there are multiple relevant resonance structures and symmetrize somehow (not sure how to do that)
Reduce reliance in the force field on explicit bond orders and formal charge and lean more on the electronic structure calculation (eg Wiberg bond order-based parameter interpolation)

We've done some work towards (3) but there's a ton of science to be done to make that happen. Really it's probably the best long-term solution since then we don't have to rely on the input resonance structure and get to use the QM to tell us what's really going on and assign parameters from there. Until we have something like that we're stuck dealing with resonance structures in one way or another.

0 replies

benjamindjensen · 2023-01-09T14:07:04Z

benjamindjensen
Jan 9, 2023
Author

@davidlmobley Thank you for the explanation!

0 replies

trevorgokey · 2023-01-12T00:29:49Z

trevorgokey
Jan 12, 2023

I think a good idea here for future parameter assignment is to use something like CanonicalRankAtoms in RDKit to group atoms into equivalence groups, and then assign parameters to the groups rather than individual atom, bonds, angles, etc. This will force symmetric atoms to get the same parameter. This approach is a solution to David's option 2.

0 replies

davidlmobley · 2023-01-16T17:49:13Z

davidlmobley
Jan 16, 2023
Maintainer

This is interesting, @trevorgokey (and there's an equivalent in the OpenEye toolkits). However, this might just shift the problem to the cheminformatics toolkit (which probably in the short term is a good thing, but not entirely a solution). Specifically, anything the cheminformatics toolkit recognizes as equivalent would then get equivalent parameters, but then everything would hinge on how the toolkits assign "equivalence". So we'd still have the same problem (eg for extended conjugated groups which perhaps ought to have equivalent parameters but are too large to be recognized by the toolkits, etc.), just in different places.

0 replies

mattwthompson · 2023-01-17T11:07:34Z

mattwthompson
Jan 17, 2023
Maintainer

I'm not sure what changes would close this issue so I suggest closing this and re-opening a new one (if there is a change that needs to be made) with more discrete and actionable objective, i.e. "reduce the SMARTS matching machinery's dependence on explicit formal charges." This is not to dismiss the report or stifle discussion but to help issue searchability and streamline the process when we inevitably come across this several months or years in the future.

0 replies

davidlmobley · 2023-01-25T00:31:01Z

davidlmobley
Jan 25, 2023
Maintainer

I more or less agree with this. Honestly, this is a science issue, not just a machinery issue... Do we have a best practices for how to handle science issues on the issue tracker?

0 replies

mattwthompson · 2023-01-25T01:33:19Z

mattwthompson
Jan 25, 2023
Maintainer

Nope, at times I've tried to redirect them to https://github.com/openforcefield/openff-forcefields/issues which I don't argue is a sustainable solution. Defer to @lilyminium on how this should be organized (specially here and/or in general).

Closing this now - not because it's a bad question (it's a great one, and I find feedback like this encouraging) but because it does not appear to be something we can resolve with changes to the software. If that's no longer true in the future we can re-open this or start a new issue that includes context presented here.

0 replies

benjamindjensen · 2023-01-25T13:43:00Z

benjamindjensen
Jan 25, 2023
Author

Thank you all for the insightful discussion and comments!

0 replies

lilyminium · 2023-01-27T04:41:01Z

lilyminium
Jan 27, 2023
Maintainer

I think a GitHub Discussions forum would be well-suited for this kind of issue, so I've moved it there. Apologies for the email spam -- I'm still trying to iron out some kinks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Force Field Initiative

Aromaticity of Imidizoliums #2

{{title}}

Replies: 12 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Open Force Field Initiative

Aromaticity of Imidizoliums #2

benjamindjensen Jan 6, 2023

Replies: 12 comments

mattwthompson Jan 6, 2023 Maintainer

Footnotes

j-wags Jan 6, 2023 Maintainer

benjamindjensen Jan 7, 2023 Author

davidlmobley Jan 7, 2023 Maintainer

benjamindjensen Jan 9, 2023 Author

trevorgokey Jan 12, 2023

davidlmobley Jan 16, 2023 Maintainer

mattwthompson Jan 17, 2023 Maintainer

davidlmobley Jan 25, 2023 Maintainer

mattwthompson Jan 25, 2023 Maintainer

benjamindjensen Jan 25, 2023 Author

lilyminium Jan 27, 2023 Maintainer

benjamindjensen
Jan 6, 2023

mattwthompson
Jan 6, 2023
Maintainer

j-wags
Jan 6, 2023
Maintainer

benjamindjensen
Jan 7, 2023
Author

davidlmobley
Jan 7, 2023
Maintainer

benjamindjensen
Jan 9, 2023
Author

trevorgokey
Jan 12, 2023

davidlmobley
Jan 16, 2023
Maintainer

mattwthompson
Jan 17, 2023
Maintainer

davidlmobley
Jan 25, 2023
Maintainer

mattwthompson
Jan 25, 2023
Maintainer

benjamindjensen
Jan 25, 2023
Author

lilyminium
Jan 27, 2023
Maintainer