Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics

Note: this README was significantly changed at the time of manuscript submission to reflect the final state of the project. For the in-progress README, see here.

Repo information

This repository contains data and code necessary for the peakpicking manuscript submitted to BMC Bioinformatics that emerged from the 2023 eScience Winter Incubator. The manuscript has been written as a reproducible R Markdown file available in the /manuscript subdirectory but requires downloading the raw mzML data from Metabolomics Workbench as the files are too large to be hosted on GitHub directly. To recompile the manuscript, clone the repository and copy the positive-mode mzML files over from MW into the associated subdirectory's /mzML folder. The entire document can then be recreated using RStudio's Knit function after launching it from the .Rproj file. This will take a while (~8 hours on my laptop), as the code runs XCMS on ~500 mass-spectrometry files, performs custom feature extraction, performs the statistics and generates the figures for the manuscript. This was tested on 7/25/2023 ahead of manuscript submission.

Authors

Project lead: William Kumler (wkumler@uw.edu)
eScience advisor: Bryna Hazelton
Academic advisor: Anitra Ingalls

Repo structure

manuscript
- Contains the R Markdown document used to create the Word document submitted to BMC Bio via knit and additional necessary data. Everything other than the RMarkdown document, the /IS_integrations directory, the refs.bib, and template.docx are created via knit.
- IS_integrations: folder containing Skyline integrations of the internal standards used to normalize the data via BMIS.
  - The is_combine.R script can be run to convert the *.sky files into clean all_IS.csv format via the SkylineRunner.exe CLI
- refs.bib: References for the manuscript, created by exporting the Mendeley project folder
- template.docx: Template for knitting, enabling e.g. line numbers and double-spacing and header formatting
MW_upload
- Metabolomics Workbench doesn't really like it when you upload projects a little bit at a time but requires the entire project be near completion prior to providing a DOI. Since the files used here are part of both this manuscript and a followup oceanography manuscript, there's some overlap between the two when uploading to MW.
- MW_formatting.R: Accepts as input the manually-copied CSV files and produces the Metabolomics Workbench formatted .txt files.
- *.csv: Files are copied over from a different repository (AllMeso) where the different analyses were collated into single documents and targeted data was used as well as the untargeted approach illustrated here.
- *.txt: Outputs from MW_formatting.R that were opened and copy-pasted into the MW upload forms.
made_data_*
- Contains raw data (mzMLs) and outputs from XCMS used in the manuscript.
  - \mzMLs: Needs to be manually created within each folder by extracting the .mzML files from the Metabolomics Workbench studies.
    - made_data_CultureData/mzMLs: Project ID -> Study ID ST002790 -> Download raw data -> ST002790_culturedata_HILIC_POS.zip
    - made_data_MS3000/mzMLs: Project ID PR001738 -> Study ID ST002789 -> Download raw data -> ST002789_HILIC_POS.zip
    - made_data_MS3000/mzMLs: Project ID PR001738 -> Study ID ST002788 -> Download raw data -> ST002788_HILIC_POS.zip
    - made_data_CultureData/mzMLs: Project ID -> Study ID ST002077 -> Download raw data -> ST002790_Pttime.zip
- classified_feats.csv: Contains manual annotations of peak quality for each feature in the dataset, named by XCMS feature number alongside the m/z/RT bounding box coordinates. Feature classifications can be "Good", "Bad", "Ambiguous", "Stans only", or "Unclassified". Annotations were performed manually using the training_tool.R script found in the root project directory after the peakpicking_and_prep.R script was run.
- features_extracted.csv: Contains the extracted metrics for each mass feature. Rows are individual XCMS features and columns are the metric extracted from each mass feature. NAs have been filled and the feature classification (see above) has been added as the final column.
peakpicking_and_prep.R: Runs XCMS on the mzMLs in each subdirectory. The first step in the process.
feature_extraction.R: Performs feature extraction on the XCMS output from peakpicking_and_prep.R. Still requires access to the raw data, however.
training_tool.R: Interactive annotation tool built to rapidly render a mass feature to the user and capture input using the arrow keys to categorize the feature's quality, one XCMS project at a time. Currently right arrow key is bound to "Bad", left is "Good", up is "Ambiguous", and down is "Stans only".
*_attempts.R and *_validation.R: Various sandboxed attempts to wrap our head around the data. None of these are necessary for the final manuscript, as analyses and figures have been embedded as R chunks.

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
MW_upload		MW_upload
made_data_CultureData		made_data_CultureData
made_data_FT2040		made_data_FT2040
made_data_MS3000		made_data_MS3000
made_data_Pttime		made_data_Pttime
manuscript		manuscript
.gitignore		.gitignore
GMM_attempts.R		GMM_attempts.R
MS_metrics.Rproj		MS_metrics.Rproj
README.md		README.md
clustering_attempts.R		clustering_attempts.R
feature_extraction.R		feature_extraction.R
model_validation.R		model_validation.R
peakpicking_and_prep.R		peakpicking_and_prep.R
pixelpicking_attempts.R		pixelpicking_attempts.R
pixelpicking_validation.R		pixelpicking_validation.R
randomforest_attempts.R		randomforest_attempts.R
regression_attempts.R		regression_attempts.R
result_sandbox.Rmd		result_sandbox.Rmd
training_tool.R		training_tool.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics

Repo information

Authors

Repo structure

About

Releases

Packages

Languages

wkumler/MS_metrics

Folders and files

Latest commit

History

Repository files navigation

Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics

Repo information

Authors

Repo structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages