Skip to content

Commit

Permalink
Add another torsion multiplicity torsion drive supplement (#356)
Browse files Browse the repository at this point in the history
  • Loading branch information
ntBre authored Jun 15, 2024
1 parent fbd649f commit 527af9e
Show file tree
Hide file tree
Showing 10 changed files with 1,415 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# OpenFF Torsion Multiplicity Torsion Drive Coverage Supplement v1.0

## Description
A torsion drive data set created to improve the coverage of both existing Sage
2.2.0 proper torsion parameters and new parameters added through the torsion
multiplicity project. The molecules in this data set were partly selected from
the ChEMBL 33 database and partly generated manually to match parameters not
covered by ChEMBL.

## General Information

* Date: 2024-06-14
* Class: OpenFF TorsionDrive Dataset
* Purpose: Improver proper torsion coverage in Sage
* Name: OpenFF Torsion Multiplicity Torsion Drive Coverage Supplement v1.0
* Number of unique molecules: 58
* Number of filtered molecules: 0
* Number of conformers: 59
* Number of conformers per molecule (min, mean, max): 1, 3.08, 10
* Mean molecular weight: 174.43
* Max molecular weight: 401.33
* Charges: [0.0, 1.0, 2.0]
* Dataset submitter: Brent Westbrook
* Dataset generator: Brent Westbrook

## QCSubmit Generation Pipeline

* `generate-dataset.ipynb`: This notebook shows how the dataset was prepared
from the input files: `dataset.smi`, `ff.offxml`, and `test.toml`.
* The list of proper torsion parameter ID and SMILES pairs in `dataset.smi` were
collected by searching the ChEMBL database for all of the molecules matching
the parameters of interest. The code used for these steps can be found
[here][frag]. Some of these (those for parameters `t146j`, `t144j`, `t117k`,
`t116i`, and `t142j`) were then truncated manually to remove large functional
groups far from the target dihedral. Finally, the last 20 molecules in
`dataset.smi` were designed by hand to match their corresponding parameter
because these parameters were not matched by any molecules in ChEMBL.

## QCSubmit Manifest

### Input Files
* `generate-dataset.ipynb`: Notebook describing dataset generation and submission
* `dataset.smi`: Sequence of parameter ID, mapped SMILES, dihedral index tuples processed by the notebook
* `ff.offxml`: Draft force field with Sage 2.2.0 proper torsions split to ensure single multiplicities
* `test.toml`: Experimental input file for defining variables used throughout the QCA submission process
* `input-environment.yaml`: Environment file used to create Python environment for the notebook
* `full-environment.yaml`: Fully-resolved environment used to execute the notebook

### Output Files
* `dataset.json.bz2`: Compressed dataset ready for submission
* `dataset.pdf`: Visualization of dataset molecules
* `output.smi`: Smiles strings for dataset molecules

## Metadata
* Elements: {N, Br, H, P, Cl, O, C, S}
* Spec: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices

<!-- References -->
[frag]: https://github.com/ntBre/valence-fitting/tree/c1e98fb20e7a4c9622ff031d8b23fb0b1846be7d/02_curate-data/frag
Git LFS file not shown
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
t61g [O:1]([C:2]([C:3]([N+:4]1([C:5]([C:6]([H:16])([H:17])[H:18])([H:14])[H:15])[C:7]([H:19])([H:20])[C:8]1([H:21])[H:22])([H:12])[H:13])([H:10])[H:11])[H:9] (2, 3, 6, 18)
t120h [c:1]1([H:16])[c:2]([H:17])[c:3]([S:4](=[O:5])(=[O:6])[N:7]=[S:8]([C:9]([H:18])([H:19])[H:20])[C:10]([C:11]([H:23])([H:24])[H:25])([H:21])[H:22])[c:12]([H:26])[c:13]([H:27])[c:14]1[C:15]([H:28])([H:29])[H:30] (8, 7, 9, 10)
t146j [H:28][C:1]([C:2]([C:3](/[N:4]=[S:5](\[N:6]([C:7]([C:8]([C:9]([H:27])([H:22])[H:23])([H:20])[H:21])([H:18])[H:19])[H:17])[C:10]([H:24])([H:25])[H:26])([H:15])[H:16])([H:13])[H:14])([H:11])[H:12] (6, 5, 4, 9)
t147j [c:1]1([H:21])[c:2]([H:22])[c:3]([H:23])[c:4]([S:5](=[O:6])(=[O:7])[N:8]=[S:9]([c:10]2[c:11]([H:24])[c:12]([H:25])[c:13]([C:14]([H:26])([H:27])[H:28])[c:15]([H:29])[c:16]2[H:30])[N:17]([C:18]([H:32])([H:33])[H:34])[H:31])[c:19]([H:35])[c:20]1[H:36] (9, 8, 16, 17)
t150h [C:1](=[S:2]([C:3]1=[C:4]2[C:5](=[O:6])[C:7]([H:27])([H:28])[C:8]([H:29])([H:30])[C:9]([H:31])([H:32])[C:10]2=[C:11]([Cl:12])[S:13]1)[N:14]([C:15](=[O:16])[N:17]([c:18]1[c:19]([H:35])[c:20]([H:36])[c:21]([Cl:22])[c:23]([H:37])[c:24]1[H:38])[H:34])[H:33])([H:25])[H:26] (2, 1, 13, 14)
t159g [O:1]([P:2]([O:3][H:8])[C:4]([N:5]([H:10])[H:11])([C:6]([H:12])([H:13])[H:14])[H:9])[H:7] (2, 1, 0, 6)
t159g [O:1]([P:2]([O:3][H:8])[C:4]([N:5]([H:10])[H:11])([C:6]([H:12])([H:13])[H:14])[H:9])[H:7] (3, 1, 0, 6)
t130h [C:1](=[C:2]([N+:3]([N:4]([C:5]([C:6]([C:7](=[O:8])[O-:9])([H:18])[H:19])([H:16])[H:17])[H:15])([C:10]([H:20])([H:21])[H:22])[C:11]([H:23])([H:24])[H:25])[H:14])([H:12])[H:13] (1, 2, 3, 4)
t149j [c:1]1([H:13])[c:2]([H:14])[c:3]([H:15])[c:4]([N:5]2[S:6](=[O:7])[O:8][C:9]([H:16])([H:17])[C:10]2([H:18])[H:19])[c:11]([H:20])[c:12]1[H:21] (6, 5, 4, 9)
t149j [c:1]1([H:21])[c:2]([H:22])[c:3]([C:4]([O:5][S:6](=[O:7])[N:8]([C:9]([C:10]([Cl:11])([H:27])[H:28])([H:25])[H:26])[C:12]([C:13]([Cl:14])([H:31])[H:32])([H:29])[H:30])([H:23])[H:24])[c:15]([H:33])[c:16]([H:34])[c:17]1[N+:18](=[O:19])[O-:20] (6, 5, 7, 8)
t127i [O:1]([C@@:2]1([H:22])[O:3][N+:4]([C:5]([c:6]2[c:7]([H:25])[c:8]([H:26])[c:9]([H:27])[c:10]([H:28])[c:11]2[H:29])([H:23])[H:24])([C:12]([c:13]2[c:14]([H:32])[c:15]([H:33])[c:16]([H:34])[c:17]([H:35])[c:18]2[H:36])([H:30])[H:31])[C:19]([H:37])([H:38])[C:20]1([H:39])[H:40])[H:21] (1, 2, 3, 11)
t117k [C:1]1([H:15])=[C:2]([Cl:3])[C:4](=[O:5])[C:6]([S+:7]2[C:8]([H:16])=[C:9]([H:17])[C:10]([C:11]([H:18])([H:19])[H:20])=[C:12]2[H:21])=[C:13]([H:22])[O:14]1 (7, 6, 11, 9)
t123ah [c:1]1([H:18])[c:2]([H:19])[c:3]([H:20])[c:4]([P:5]([c:6]2[c:7]([H:21])[c:8]([H:22])[c:9]([H:23])[c:10]([H:24])[c:11]2[H:25])[C:12]([C:13]([C:14]([N:15]([H:32])[H:33])([H:30])[H:31])([H:28])[H:29])([H:26])[H:27])[c:16]([H:34])[c:17]1[H:35] (3, 4, 11, 12)
t123ah [C:1]([P:2]([C:3]([H:8])([H:9])[H:10])[C:4]([H:11])([H:12])[H:13])([H:5])([H:6])[H:7] (0, 1, 2, 7)
t124g [c:1]1([H:20])[n:2][c:3]([P:4]([c:5]2[n:6][c:7]([H:21])[c:8]([H:22])[c:9]([H:23])[c:10]2[H:24])[c:11]2[c:12]([H:25])[c:13]([H:26])[c:14]([H:27])[c:15]([H:28])[c:16]2[H:29])[c:17]([H:30])[c:18]([H:31])[c:19]1[H:32] (2, 3, 4, 5)
t138a [O:1]([c:2]1[c:3]([H:16])[c:4]([H:17])[c:5](/[C:6](=[N:7]\[N+:8]([C:9]([H:19])([H:20])[H:21])([C:10]([H:22])([H:23])[H:24])[C:11]([H:25])([H:26])[H:27])[H:18])[c:12]([H:28])[c:13]1[O:14][H:29])[H:15] (5, 6, 7, 8)
t142j [O:1]([C@@:2]1([H:23])[C@:3]([O:4][H:25])([H:24])[C@@:5]([H:44])([H:26])[O:6][C@@:7]1([C:8]([O:9][S+:10]([C:11]([H:30])([H:31])[H:32])[C:12]([C:13]([c:14]1[c:15]([H:37])[c:16]([H:38])[c:17]([H:39])[c:18]([H:40])[c:19]1[O:20][C:21]([H:41])([H:42])[H:43])([H:35])[H:36])([H:33])[H:34])([H:28])[H:29])[H:27])[H:22] (7, 8, 9, 10)
t142j [O:1]([C@@:2]1([H:21])[C@:3]([O:4][H:23])([H:22])[C@@:5]([H:40])([H:24])[O:6][C@@:7]1([C:8]([O:9][S+:10]([C:11]([H:28])([H:29])[H:30])[C:12]([C:13]([c:14]1[c:15]([H:35])[c:16]([H:36])[c:17]([H:37])[c:18]([H:38])[c:19]1[H:39])([H:33])[H:34])([H:31])[H:32])([H:26])[H:27])[H:25])[H:20] (7, 8, 9, 10)
t144j [H:28][C:1]([C:2]([C:3](/[N:4]=[S:5](\[N:6]([C:7]([C:8]([C:9]([H:27])([H:22])[H:23])([H:20])[H:21])([H:18])[H:19])[H:17])[C:10]([H:24])([H:25])[H:26])([H:15])[H:16])([H:13])[H:14])([H:11])[H:12] (9, 4, 5, 16)
t144j [c:1]1([H:20])[c:2]([H:21])[c:3]([C:4]([H:22])([H:23])[H:24])[c:5]([H:25])[c:6]([H:26])[c:7]1[C:8](=[O:9])[N:10](/[S:11](=[N:12]/[C:13](=[O:14])[C:15]([H:28])([H:29])[H:30])[C:16]([Cl:17])([Cl:18])[Cl:19])[H:27] (15, 10, 9, 26)
t145j [c:1]1([H:21])[c:2]([H:22])[c:3]([H:23])[c:4]([S:5](=[O:6])(=[O:7])[N:8]=[S:9]([c:10]2[c:11]([H:24])[c:12]([H:25])[c:13]([C:14]([H:26])([H:27])[H:28])[c:15]([H:29])[c:16]2[H:30])[N:17]([C:18]([H:32])([H:33])[H:34])[H:31])[c:19]([H:35])[c:20]1[H:36] (9, 8, 16, 30)
t148j [O:1]([S:2](=[O:3])[N:4]([H:6])[H:7])[H:5] (2, 1, 3, 5)
t148j [C:1]1([H:17])=[C:2]([C:3]([C:4]2=[N:5][O:6][S:7](=[O:8])[N:9]2[H:20])([H:18])[H:19])[c:10]2[c:11]([H:21])[c:12]([H:22])[c:13]([H:23])[c:14]([H:24])[c:15]2[O:16]1 (7, 6, 8, 19)
t151h [c:1]1([H:20])[c:2]([H:21])[c:3]([C:4]([H:22])([H:23])[H:24])[c:5]([H:25])[c:6]([H:26])[c:7]1[C:8](=[O:9])[N:10](/[S:11](=[N:12]/[C:13](=[O:14])[C:15]([H:28])([H:29])[H:30])[C:16]([Cl:17])([Cl:18])[Cl:19])[H:27] (7, 9, 10, 15)
t152h [C:1]1([H:18])=[C:2]([C:3]([C:4]2=[N:5][O:6][S:7](=[O:8])[N:9]2[H:21])([H:19])[H:20])[c:10]2[c:11]([H:22])[c:12]([Br:13])[c:14]([H:23])[c:15]([H:24])[c:16]2[O:17]1 (3, 8, 6, 7)
t154h [N:1](=[S:2]([c:3]1[c:4]([H:16])[c:5]([H:17])[c:6]([H:18])[c:7]([H:19])[c:8]1[H:20])[c:9]1[c:10]([H:21])[c:11]([H:22])[c:12]([H:23])[c:13]([H:24])[c:14]1[H:25])[H:15] (2, 1, 0, 14)
t161g [O:1]([P:2](=[O:3])(/[N:4]=[C:5]1\[N:6]([H:14])[C:7](=[O:8])[C:9]([H:15])([H:16])[N:10]1[C:11]([H:17])([H:18])[H:19])[O:12][H:20])[H:13] (0, 1, 3, 4)
t142k [c:1]1([H:22])[c:2]([H:23])[c:3]([H:24])[c:4]([S+:5]2[c:6]3[c:7]([H:25])[c:8]([H:26])[c:9]([H:27])[c:10]([H:28])[c:11]3[C:12](=[O:13])[N:14]2[C:15]([H:29])([H:30])[H:31])[c:16]([C:17](=[O:18])[N:19]([C:20]([H:33])([H:34])[H:35])[H:32])[c:21]1[H:36] (3, 4, 13, 11)
t84h [c:1]1([H:17])[c:2]([H:18])[c:3]([H:19])[c:4]2[c:5]([c:6]1[H:20])[n+:7]([O-:8])[n:9][c:10]([N:11]([C:12]1([H:22])[C:13]([H:23])([H:24])[C:14]1([H:25])[H:26])[H:21])[n+:15]2[O-:16] (3, 14, 9, 10)
t89 [O:1]([C:2](=[O:3])[C:4]([S:5][C:6]1=[N:7][S:8](=[O:9])(=[O:10])[c:11]2[c:12]([H:20])[c:13]([H:21])[c:14]([H:22])[c:15]([H:23])[c:16]21)([H:18])[H:19])[H:17] (4, 5, 6, 7)
t115i [O:1]([C:2]([C@:3]([O:4][S:5](=[O:6])(=[O:7])[O-:8])([C@:9]([O:10][H:26])([C:11]([S+:12]1[C:13]([H:29])([H:30])[C@:14]([O:15][H:32])([H:31])[C@@:16]([O:17][H:34])([H:33])[C@:18]1([C:19]([O:20][H:38])([H:36])[H:37])[H:35])([H:27])[H:28])[H:25])[H:24])([H:22])[H:23])[H:21] (10, 11, 12, 13)
t116i [C:1]1([H:15])=[C:2]([H:16])[S+:3]([C:4](=[O:5])[N:6]2[C:7](=[O:8])/[C:9]([H:20])([H:21])[C:10](=[O:11])[N:12]2[H:17])[C:13]([H:18])=[C:14]1[H:19] (1, 2, 12, 17)
t116i [C:1]1([H:15])=[C:2]([Cl:3])[C:4](=[O:5])[C:6]([S+:7]2[C:8]([H:16])=[C:9]([H:17])[C:10]([C:11]([H:18])([H:19])[H:20])=[C:12]2[H:21])=[C:13]([H:22])[O:14]1 (5, 6, 7, 15)
t116j [O:1]([C:2](=[O:3])[C@@:4]([N:5]([H:13])[H:14])([C:6]([C:7]([S+:8]([C:9]([H:19])([H:20])[H:21])[C:10]([H:22])([H:23])[H:24])([H:17])[H:18])([H:15])[H:16])[H:12])[H:11] (6, 7, 8, 18)
t141 [O:1](/[N:2]=[N+:3](\[O-:4])[N:5]([C:6]([H:11])([H:12])[H:13])[C:7]([C:8]([O:9][H:18])([H:16])[H:17])([H:14])[H:15])[H:10] (0, 1, 2, 3)
t141 [O:1](/[N:2]=[N+:3](\[O-:4])[N:5]([C:6]([H:11])([H:12])[H:13])[C:7]([C:8]([O:9][H:18])([H:16])[H:17])([H:14])[H:15])[H:10] (0, 1, 2, 4)
t141 [C:1]1([H:15])=[C:2]([O-:3])[O:4][N:5]=[N+:6]1[C:7]([C:8]([N+:9]1=[N:10][O:11][C:12]([O-:13])=[C:14]1[H:20])([H:18])[H:19])([H:16])[H:17] (3, 4, 5, 6)
t141ci [O:1]([C:2](=[O:3])[C:4]1=[N:5][N:6]([H:14])[C:7]2=[C:8]1[C:9]([H:15])([H:16])[C@@:10]1([H:17])[C:11]([H:18])([H:19])[C@@:12]21[H:20])[H:13] (7, 6, 11, 19)
t141ci [N:1]1([H:13])[N:2]=[C:3]2[C:4]([H:14])([H:15])[C:5]([H:16])([H:17])[C:6]([H:18])([H:19])[C:7]([H:20])([H:21])[C:8]([H:22])([H:23])[C:9]([H:24])([H:25])[C:10]2([H:26])[C:11]1=[O:12] (3, 2, 9, 25)
t141ch [C:1]1([H:10])=[C:2]2[C:3]([H:11])([H:12])[C:4](=[O:5])[O:6][C:7]2([H:13])[C:8]([H:14])([H:15])[C:9]1([H:16])[H:17] (0, 1, 6, 12)
t141ch [c:1]1([H:11])[c:2]([H:12])[c:3]([H:13])[c:4]2[c:5]([c:6]1[H:14])[C:7]([H:15])([H:16])[C:8]1([H:17])[O:9][C:10]21[H:18] (2, 3, 9, 17)
t141ah [N:1](=[C:2]1[N:3]([H:10])[C:4](=[O:5])[N:6]2[C:7]([H:11])([H:12])[C:8]12[H:13])[H:9] (3, 5, 7, 12)
t141ah [O:1]([C:2](=[O:3])[C:4]1=[C:5]([H:13])[S:6][C:7]2([H:14])[N:8]1[C:9](=[O:10])[C:11]2([H:15])[H:16])[H:12] (8, 7, 6, 13)
t141ah [N:1]1([H:11])[C:2](=[O:3])[N:4]2[C:5]([H:12])([H:13])[S:6][C:7]([H:14])([H:15])[C:8]2([H:16])[C:9]1=[O:10] (4, 3, 7, 15)
t141bg [C:1]1([H:10])=[N:2][C:3]2=[C:4]([H:11])[C:5]([H:12])=[N:6][N:7]2[C:8]([H:13])=[C:9]1[H:14] (4, 5, 6, 7)
t141cg [C:1]1([H:7])([H:8])[O:2][C:3]([H:9])([H:10])[C:4]2([H:11])[O:5][C:6]12[H:12] (0, 5, 3, 10)
t141cg [C:1]1([H:8])=[C:2]([H:9])[C:3]2([H:10])[O:4][C:5]2([H:11])[C:6]([H:12])=[C:7]1[H:13] (9, 2, 4, 10)
t141cj [C:1]1([H:11])=[N:2][C:3]2=[C:4]([H:12])[C:5]([H:13])=[C:6]([H:14])[N:7]([C:8]([H:15])([H:16])[H:17])[C:9]2=[C:10]1[H:18] (3, 2, 8, 9)
# above are from chembl without modification, below are written manually
t158i [C:1]([S+:2]([C:3]([H:8])([H:9])[H:10])[S:4][H:11])([H:5])([H:6])[H:7] (0, 1, 3, 10)
t156h [O:1]=[S:2]([C:3]([H:6])([H:7])[H:8])[N:4]=[C:5]([H:9])[H:10] (0, 1, 3, 4)
t148i [O:1]=[S:2]([C:3]([H:8])([H:9])[H:10])[N+:4]([H:5])([C:6]([H:11])([H:12])[H:13])[C:7]([H:14])([H:15])[H:16] (0, 1, 3, 4)
t145i [C:1](=[C:2]([S:3](=[O:4])[N+:5]([H:6])([C:7]([H:12])([H:13])[H:14])[C:8]([H:15])([H:16])[H:17])[H:11])([H:9])[H:10] (1, 2, 4, 5)
t149i [O:1]=[S:2]([N+:3]([H:4])([H:5])[C:6]([H:8])([H:9])[H:10])[H:7] (0, 1, 2, 5)
t144g [C:1]([S:2]([C:3]([H:12])([H:13])[H:14])([C:4]([H:15])([H:16])[H:17])[N+:5]([H:6])([H:7])[C:8]([H:18])([H:19])[H:20])([H:9])([H:10])[H:11] (3, 1, 4, 6)
t132h [C:1]([N:2]([N+:3]([H:4])([H:5])[C:6]([H:11])([H:12])[H:13])[H:10])([H:7])([H:8])[H:9] (0, 1, 2, 4)
t148g [O:1]=[S:2](=[O:3])([N+:4]([H:5])([H:6])[C:7]([H:9])([H:10])[H:11])[H:8] (2, 1, 3, 5)
t149g [O:1]=[S:2](=[O:3])([N+:4]([H:5])([H:6])[C:7]([H:9])([H:10])[H:11])[H:8] (2, 1, 3, 6)
t146i [C:1]([S:2](=[O:3])[N+:4]([H:5])([H:6])[C:7]([H:11])([H:12])[H:13])([H:8])([H:9])[H:10] (0, 1, 3, 6)
t131g [C:1]([N+:2]([H:3])([H:4])[N+:5]([H:6])([H:7])[C:8]([H:12])([H:13])[H:14])([H:9])([H:10])[H:11] (3, 1, 4, 6)
t145g [C:1](=[C:2]([S:3](=[O:4])(=[O:5])[N+:6]([H:7])([H:8])[C:9]([H:13])([H:14])[H:15])[H:12])([H:10])[H:11] (1, 2, 5, 7)
t158j [C:1]([S+:2]([H:3])[S+:4]([H:5])[C:6]([H:10])([H:11])[H:12])([H:7])([H:8])[H:9] (2, 1, 3, 5)
t60g [C:1]([N+:2]([H:3])([H:4])[C:5]1([H:11])[C:6]([H:12])([H:13])[C:7]1([H:14])[H:15])([H:8])([H:9])[H:10] (3, 1, 4, 6)
t147i [C:1](=[C:2]([S:3](=[O:4])[N+:5]([H:6])([H:7])[C:8]([H:12])([H:13])[H:14])[H:11])([H:9])[H:10] (1, 2, 4, 7)
t144i [C:1]([S:2](=[O:3])[N+:4]([H:5])([H:6])[C:7]([H:11])([H:12])[H:13])([H:8])([H:9])[H:10] (0, 1, 3, 5)
t59g [C:1]([N+:2]([H:3])([H:4])[C:5]1([H:11])[C:6]([H:12])([H:13])[C:7]1([H:14])[H:15])([H:8])([H:9])[H:10] (3, 1, 4, 10)
t153h [O:1]=[S:2]([N:3]([N:4]=[C:5]([H:8])[H:9])[H:7])[H:6] (0, 1, 2, 3)
t146g [C:1]([S:2](=[O:3])(=[O:4])[N+:5]([H:6])([H:7])[C:8]([H:12])([H:13])[H:14])([H:9])([H:10])[H:11] (0, 1, 4, 7)
t147g [C:1](=[C:2]([S:3](=[O:4])(=[O:5])[N+:6]([H:7])([H:8])[C:9]([H:13])([H:14])[H:15])[H:12])([H:10])[H:11] (1, 2, 5, 8)
Loading

0 comments on commit 527af9e

Please sign in to comment.