Skip to content

Useful datasets

Marcus Wieder edited this page Oct 15, 2024 · 16 revisions
Dataset name Level of theory Chemistry covered Data points Total number of samples Number of molecules Elements
QM9 B3LYP/6-31G(2df,p) small molecules (< 9 heavy atoms) Energy, additional QM properties 134,000 134,000 H,C,N,O,F
ANI-1 wB97x/6-31G(d) small molecules Energy, Force, additional QM properties 20,000,000 57,462 H,C,N,O
ANI-1x wB97x/6-31G(d) small molecules Energy, Force, additional QM properties 5,000,000 20,854 H,C,N,O
ANI-1ccx CCSD(T)/CBS small molecules Energy, Force, additional QM properties 489,971 ? H,C,N,O
ANI-2x wB97x/6-31G(d) wB97x/def2TZVPP wB97xMD3BJ/def2TZVP wB97MV/def2TZVP PB973c/def2mTZVP small molecules Energy, Force, additional QM properties 8,935,411 4,254 H,C,N,O,S,F,Cl
SPICE ωB97M-D3BJ/def2-TZVPPD dipeptides, solvated amino acids, small molecules, DES370K dataset, ion pairs Energy, Force, additional QM properties 2,008,628 113,999 H,Li,B,C,N,O,F,Na,Mg, Si,P,S,Cl,K,Ca,Br,I
QMugs Geometry: GFN2-xTB Property: ωB97X-D/def2-SVP biologically and pharmacologically relevant molecules extracted from the ChEMBL Energy, Force, additional QM properties 2,000,000 665,000
Splinter dataset SAPT0/aug-cc-pV(D + d)Z] SAPT0/jun-cc-pV(D + d)Z] Dimers of commenly found protein-ligand interaction substrcutrues Energy 1,700,000 9,000 unique dimers, 332 unique small molecules
Alchemy B3LYP/6-31G(2df,p) between 9-12 heavy atoms Energy, additional QM properties 119,487 C,N,O,F,S,Cl
ChemSpider data set Energy, Force 3,000,000 15,000 C,H,N,O
ISO-17 B3LYP/6-31G(2df,p) isomers with C7O2H10 Energy, Force 5000 129 C,H,O
tmQM Dataset TPSSh-D3BJ/def2-SVP mononuclear complexes includes Werner, bioinorganic and organometallic complexes based on a large variety of organic ligands and 30 transition metals Energy, additional QM properties 86,665
GEOM GFN2-xTB Energy, Force, additional QM properties 37,000,000 450,000
PhAlkEthOH B3LYP-D3BJ/DZVP collection of optimised geometries of alkyl, aryl, and hydroxyl Energy, Force, additional QM properties
Aquamarine (AQM) geometry:DFTB3+MBD property:PBE0(tight)+MBA molecules with total number of atoms ranging from 2 to 92 Energy, Force, physiochemical properties in vacuum and implicit solvent 1653 59783 C,N,O,H,Cl,S.P,F
OE62 geometery:PBE(tight)+TS-vdW property: PBE0(tight) up to 92 heavy atoms based on organic crystals extracted from the CSD, only equilbirium data. For a subset of 32K molecules energy is also provided under the influce of an implicit solvent model Energy, physiochemical properties 30876 30876 H,Li,B,C,N,O,F,Si,P,S, Cl,As,Se,Br,Te,I
BACE geometry:r2scan-3c/mTZVPP property:r2scan-3c/mTZVPP up to 61 heavy atoms 534 455000
Vector-QM24 (VQM24) DMC(Diffusion Quantum Monte Carlo) @PBE0(ccECP/cc-pVQZ C,N,O,F,Si,P,S,Cl,Br
COLL geometry: GFN2-xTB property: revPBE/def2-TZVP/D3 airs of molecules reacting at high kinetic energies. 140K
Clone this wiki locally