Skip to content

A repository for physico-chemical data extracted from the NIST Chemistry WebBook

License

Notifications You must be signed in to change notification settings

IvanChernyshov/NistChemData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NistChemData: Extracted Physico-Chemical Data from NIST Chemistry WebBook

NistChemData is a repository for physico-chemical data extracted from the NIST Chemistry WebBook.

Currently, it includes spectral (IR, THz IR, MS, UV-Vis) and quantum chemical data.

Data extraction was carried out using the NistChemPy package. For more details, please refer to the scripts/ directory.

As NistChemPy continues to evolve and enhance its extraction capabilities, we will incorporate additional thermodynamic and spectral data into this repository.

The scripts used to extract and prepare the data presented in this repository are located in the scripts/ folder.

Data Cookbook

A tabular list of 129345 compounds from the NIST Chemistry WebBook, including the following parameters:

  • ID (str): NIST Chemistry WebBook Compound ID;

  • name (str): chemical name;

  • synonyms (str): alternative chemical names, separated by "\n";

  • formula (str): chemical formula;

  • cas_rn (str): CAS Registry Number;

  • mol_weight (float): molar weight, g/mol;

  • inchi (str): InChI string;

  • inchi_key (str): InChI Key string.

SDF-file containing 3D atomic coordinates for 48325 WebBook compounds along with the following computed properties:

  • WEBBOOK.ID: NIST Chemistry WebBook Compound ID;

  • METHOD: quantum chemical approximation used for the computations;

  • DIPOLE.MOMENT: dipole moment;

  • ELECTRONIC.ENERGY: absolute electronic energy;

  • IR.FREQUENCIES: computed frequencies and their IR intensities;

  • ROTATIONAL.CONSTANTS: rotational constants.

Spectra

  1. Raw spectra: contains JDX-formatted IR, THz, MS, and UV-Vis spectra. Spectra are organized by type and archived in zip files.

    • 19582 IR spectra for 15890 compounds;

    • 35 THz spectra for 32 compounds;

    • 33285 MS spectra for 33285 compounds;

    • 3063 UV-Vis spectra for 3057 compounds;

    • File naming convention: {NIST Compound ID}_{Spectrum Type}_{Spectrum Index};

    • Please note that some spectra (primarily IR) of the same component may appear identical, differing only in resolution (number of points per micrometer).

  2. Processed MS data: contains information on electron ionization mass spectrometry (MS) spectra, including the following fields:

    • ID / name / inchi (str): same as in nist_compounds.csv;

    • mz & intensities (list[int]): lists of m/z values and relative intensities normalized to 9999.

About

A repository for physico-chemical data extracted from the NIST Chemistry WebBook

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages