-
Notifications
You must be signed in to change notification settings - Fork 5
Useful datasets
Marcus Wieder edited this page Oct 15, 2024
·
16 revisions
Dataset name | Level of theory | Chemistry covered | Data points | Total number of samples | Number of molecules | Elements | |
---|---|---|---|---|---|---|---|
QM9 | B3LYP/6-31G(2df,p) | small molecules (< 9 heavy atoms) | Energy, additional QM properties | 134,000 | 134,000 | H,C,N,O,F | |
ANI-1 | wB97x/6-31G(d) | small molecules | Energy, Force, additional QM properties | 20,000,000 | 57,462 | H,C,N,O | |
ANI-1x | wB97x/6-31G(d) | small molecules | Energy, Force, additional QM properties | 5,000,000 | 20,854 | H,C,N,O | |
ANI-1ccx | CCSD(T)/CBS | small molecules | Energy, Force, additional QM properties | 489,971 | ? | H,C,N,O | |
ANI-2x | wB97x/6-31G(d) wB97x/def2TZVPP wB97xMD3BJ/def2TZVP wB97MV/def2TZVP PB973c/def2mTZVP | small molecules | Energy, Force, additional QM properties | 8,935,411 | 4,254 | H,C,N,O,S,F,Cl | |
SPICE | ωB97M-D3BJ/def2-TZVPPD | dipeptides, solvated amino acids, small molecules, DES370K dataset, ion pairs | Energy, Force, additional QM properties | 2,008,628 | 113,999 | H,Li,B,C,N,O,F,Na,Mg, Si,P,S,Cl,K,Ca,Br,I | |
QMugs | Geometry: GFN2-xTB Property: ωB97X-D/def2-SVP | biologically and pharmacologically relevant molecules extracted from the ChEMBL | Energy, Force, additional QM properties | 2,000,000 | 665,000 | ||
Splinter dataset | SAPT0/aug-cc-pV(D + d)Z] SAPT0/jun-cc-pV(D + d)Z] | Dimers of commenly found protein-ligand interaction substrcutrues | Energy | 1,700,000 | 9,000 unique dimers, 332 unique small molecules | ||
Alchemy | B3LYP/6-31G(2df,p) | between 9-12 heavy atoms | Energy, additional QM properties | 119,487 | C,N,O,F,S,Cl | ||
ChemSpider data set | Energy, Force | 3,000,000 | 15,000 | C,H,N,O | |||
ISO-17 | B3LYP/6-31G(2df,p) | isomers with C7O2H10 | Energy, Force | 5000 | 129 | C,H,O | |
tmQM Dataset | TPSSh-D3BJ/def2-SVP | mononuclear complexes includes Werner, bioinorganic and organometallic complexes based on a large variety of organic ligands and 30 transition metals | Energy, additional QM properties | 86,665 | |||
GEOM | GFN2-xTB | Energy, Force, additional QM properties | 37,000,000 | 450,000 | |||
PhAlkEthOH | B3LYP-D3BJ/DZVP | collection of optimised geometries of alkyl, aryl, and hydroxyl | Energy, Force, additional QM properties | ||||
Aquamarine (AQM) | geometry:DFTB3+MBD property:PBE0(tight)+MBA | molecules with total number of atoms ranging from 2 to 92 | Energy, Force, physiochemical properties in vacuum and implicit solvent | 1653 | 59783 | C,N,O,H,Cl,S.P,F | |
OE62 | geometery:PBE(tight)+TS-vdW property: PBE0(tight) | up to 92 heavy atoms | based on organic crystals extracted from the CSD, only equilbirium data. For a subset of 32K molecules energy is also provided under the influce of an implicit solvent model | Energy, physiochemical properties | 30876 | 30876 | H,Li,B,C,N,O,F,Si,P,S, Cl,As,Se,Br,Te,I |
BACE | geometry:r2scan-3c/mTZVPP property:r2scan-3c/mTZVPP | up to 61 heavy atoms | 534 | 455000 | |||
Vector-QM24 (VQM24) | DMC(Diffusion Quantum Monte Carlo) @PBE0(ccECP/cc-pVQZ | C,N,O,F,Si,P,S,Cl,Br | |||||
COLL | geometry: GFN2-xTB property: revPBE/def2-TZVP/D3 | airs of molecules reacting at high kinetic energies. | 140K | ||||