This is the stable version of LibGen, with sample codes/data for showing the capability of the pipeline.
LibGen is a dedicated software for building high-quality mass spectral libraries from reference standard mixes.
The code was developed and Python 3.8. Run it in a dedicated conda environment is highly recommended. A yaml file will be uploaded soon.
- The current version is only available by cloning the repository from GitHub.
- We plan to upload it onto PIP in the near future.
There are three main moudules in this pipeline: standard list preparation, feature finding and library curation.
A workspace is just a folder for all of the files you will need for your library curation process. You can run the following code to create a workspace and put corresponding files in:
python setup_workspace.py [the_place_you_want_the_workspace_in] [your_library_name]
The script will create a new folder with following subdirectories:
- standard_list - the standard list you provided, as well as curated standard list by Standard list preparation section
- spectra - all raw spectra files acquried from instrument
- features - peak picking results resulted from feature finding section
- curated library - the curated libraries
- figures - saved figures (if any)
It will print out the full directory to your workspace [the_director_of_your_workspace]. Please use it in the following section.
The standard list mainly composes of 3 columns: name, inchikey, and mix label (the file names of the spectra). Please refer to the sample standard list in the sample_data folder, and please do not change the column name. With your sample list, please run following code to execute the preparation process.
python standard_list_prep.py [the_director_of_your_workspace]
The prepared standard list will also show in standard_list directory, but with additional tail of '_cleaned' in file name
The feature finding is done by the custom code, ff_droup. It is a specialized feature finding process starting from MS/MS, rather than MS1. The names of the raw spectra should be similar to the mix labels in the standard list, but does not to be identical since fuzzy match is enabled. Also note that the files needs to be centroid, mzML files. If not, please use MSConvert. Please run following code for the feature finding process.
python feature_finding.py [the_director_of_your_workspace]
Now is eventually the time for curating the libarires! Please run the following code
python feature_finding.py -msp [the_director_of_your_workspace]
It will automatically export the curated libraries in both .msp format and .csv format. If you don't like .msp format, you can use the change -msp to:
- -mgf: export to .mgf format
- -ms: export to .ms file (SIRIUS propritery)
- -mat: export to .mat file (MS-Finder propritery)
Please contact me directly for help information.
Fanzhou Kong
Email: fzkong@ucdavis.edu
- 1.0
- Initial implementation of the project, deprecated.
- 2.0
- Current release version of the LibGen, fully functioning.
This project is licensed under Apache 2.0 license.
Special thanks to Dr. Yuanyue Li, who developed the entropy similarity/spectral similarity algorithms, which serves as an essential part in this project.
Also appreciation goes to everyone in the Fiehn lab at UC Davis West Coast Metabolomics Center. This couldn't been doen without you.