Aromaticity of Imidizoliums #2
Replies: 12 comments
-
I can reproduce this (thanks for the excellent write-up) and agree that one would expect these parameters to be symmetric. Normally slight differences in parametrizing aren't impactful, but it looks like these parameters have significantly different values. It also strikes me that parameter If I understand the background here, which is far from guaranteed, one of the confusing things about this case is that the MDL aromaticity model doesn't recognize five-membered rings as aromatic, even though some (most? all?) are in practice. OpenFF doesn't currently have its own aromaticity model, deferring to RDKit 1 and OpenEye 2 to handle it, although only MDL is implemented now. In [1]: from openff.toolkit.utils.toolkits import *
In [2]: from openff.toolkit import *
In [3]: {atom.is_aromatic for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC", toolkit_registry=RDKitToolkitWrapper()).atoms}
Out[3]: {False}
In [4]: {atom.is_aromatic for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC", toolkit_registry=OpenEyeToolkitWrapper()).atoms}
Out[4]: {False} But that's not necessarily the root of the issue, since the SMIRKS patterns are hung up on the existence of the partial charge, not aromaticity. The parameter with the SMIRKS pattern In [1]: from openff.toolkit import *
In [2]: [(atom.atomic_number, atom.formal_charge.m) for atom in Molecule.from_smiles("CCN1C=C[N+](=C1)CC").atoms]
Out[2]:
[(6, 0),
(6, 0),
(7, 0),
(6, 0),
(6, 0),
(7, 1),
(6, 0),
(6, 0),
(6, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0),
(1, 0)] I'm actually not sure how a delocalized charge like this could be represented on a single molecule with MDL aromaticity - @j-wags would know better whether it's actually impossible. Footnotes |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great writeup @benjamindjensen! Yeah... this is an undesired behavior, but it would be tricky to fix because it walks the jurisdictional line between the infrastructure and the data. In the infrastructure, formal charges are currently strictly integer. To allow anything else would be really complicated and make interoperability with the rest of the cheminformatics world really hard. In the data (force fields), parameter SMARTS are free to use formal charge and aromaticity in matching. In this case, the parameter tries to distinguish between chemistries using formal charge, but the fact that it's delocalized means that we probably need to use aromaticity to distinguish (and even then, the MDL aromaticity model in particular wouldn't have caught this anyway) So, the issue is valid and I think it's slightly distinct from #1258, it'll just be a lot of effort to fix. We should leave this open and tag with high effort and low priority unless we see if causing big problems in conformer/free energy accuracy. |
Beta Was this translation helpful? Give feedback.
-
@mattwthompson and @j-wags - Thank you for the detailed and quick feedback on the issue. It does sound like a complex situation. Appreciated the clarification! |
Beta Was this translation helpful? Give feedback.
-
My take is that this is ... not exactly a bug, but more of a "desired improvement in the force field". I don't necessarily think this is even that tied to the aromaticiity model. At present, we have to use SOME aromaticity model, and the MDL one is the best fit. Regardless, any aromaticity model is not perfect and has edge cases. But even if aromaticity was handled perfectly here, I'm not sure that would fix the problem. Here's what I think the issue is: it's essentially a tautomer/resonance issue like, say, carboxylates or nitros or similar. For these smaller groups like nitros, we have substructure searches built in that basically detect these and say, "ah, here you have something you've input as asymmetric (as you should) but the parameters really ought to be symmetric". That's fine for very small groups like nitros or carboxylates, but as the size of the functional group gets bigger it's harder to think of all the possible cases one would want to consider and catch them in the force field. I THINK that's exactly what's going on here. Naturally, when putting in the molecule, you have to localize the positive charge SOMEWHERE even though it's going to be delocalized across the ring. And the hope, of course, is that the toolkit/force field would realize this and symmetrize everything. However, I think there's only a couple ways to do that, and none are easy:
We've done some work towards (3) but there's a ton of science to be done to make that happen. Really it's probably the best long-term solution since then we don't have to rely on the input resonance structure and get to use the QM to tell us what's really going on and assign parameters from there. Until we have something like that we're stuck dealing with resonance structures in one way or another. |
Beta Was this translation helpful? Give feedback.
-
@davidlmobley Thank you for the explanation! |
Beta Was this translation helpful? Give feedback.
-
I think a good idea here for future parameter assignment is to use something like CanonicalRankAtoms in RDKit to group atoms into equivalence groups, and then assign parameters to the groups rather than individual atom, bonds, angles, etc. This will force symmetric atoms to get the same parameter. This approach is a solution to David's option 2. |
Beta Was this translation helpful? Give feedback.
-
This is interesting, @trevorgokey (and there's an equivalent in the OpenEye toolkits). However, this might just shift the problem to the cheminformatics toolkit (which probably in the short term is a good thing, but not entirely a solution). Specifically, anything the cheminformatics toolkit recognizes as equivalent would then get equivalent parameters, but then everything would hinge on how the toolkits assign "equivalence". So we'd still have the same problem (eg for extended conjugated groups which perhaps ought to have equivalent parameters but are too large to be recognized by the toolkits, etc.), just in different places. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure what changes would close this issue so I suggest closing this and re-opening a new one (if there is a change that needs to be made) with more discrete and actionable objective, i.e. "reduce the SMARTS matching machinery's dependence on explicit formal charges." This is not to dismiss the report or stifle discussion but to help issue searchability and streamline the process when we inevitably come across this several months or years in the future. |
Beta Was this translation helpful? Give feedback.
-
I more or less agree with this. Honestly, this is a science issue, not just a machinery issue... Do we have a best practices for how to handle science issues on the issue tracker? |
Beta Was this translation helpful? Give feedback.
-
Nope, at times I've tried to redirect them to https://github.com/openforcefield/openff-forcefields/issues which I don't argue is a sustainable solution. Defer to @lilyminium on how this should be organized (specially here and/or in general). Closing this now - not because it's a bad question (it's a great one, and I find feedback like this encouraging) but because it does not appear to be something we can resolve with changes to the software. If that's no longer true in the future we can re-open this or start a new issue that includes context presented here. |
Beta Was this translation helpful? Give feedback.
-
Thank you all for the insightful discussion and comments! |
Beta Was this translation helpful? Give feedback.
-
I think a GitHub Discussions forum would be well-suited for this kind of issue, so I've moved it there. Apologies for the email spam -- I'm still trying to iron out some kinks. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
For a imidazolium molecule I was expecting a molecule with symmetrical atom/bond/... types based on the molecule resonance. However OpenFF perception is assigning different types based on the location of the charged nitrogen in the inputed structure. Maybe this is expected behavior for OpenFF based on my understanding of #1258, but would appreciate confirmation if possible.
To Reproduce
Output
The hydrogen atoms near the N+ atom are labeled with
[#1:1]-[#6X4]~[*+1,*+2]
, while the hydrogen atoms near the other nitrogen are labeled with[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]
.Similar behavior for bonds/angle/etc as well.
Computing environment (please complete the following information):
Beta Was this translation helpful? Give feedback.
All reactions