Skip to content

MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modeling approach.

License

Notifications You must be signed in to change notification settings

marcottelab/MSblender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSblender

MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

See http://www.marcottelab.org/index.php/MSblender for more (somewhat outdated) information. Citation:

T. Kwon*, H. Choi*, C. Vogel, A.I. Nesvizhskii, and E.M. Marcotte, MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J. Proteome Research, 10(7): 2949–2958 (2011) Link 

About

This version is modified from JRH MS1-Quant-Pipeline but minus any MS1 quantification

It has been further modified to make parameters more consistent across search algorithms. MS-GF+ options and PTMs are now defined in dedicated parameter files.

This repo contains:

-  msblender MS2 analysis

-  helper scripts

-  accessory files and parameters used MS intepretation programs

Available search engines:

- X!Tandem

- Comet

- MS-GF+

All programs are external from SearchGUI except X!tandem

Quick start

# set up directories (**replace "proj" with your project name**)
mkdir -p proj/{mzXML,db,working,output,logs}
# symlink raw data to mzxml directory
ln -s /path/to/mzxmls/*mzXML proj/mzXML
# make fasta database (**replace "proteome" with your fasta name**)
# there is a contam.fasta here: example/fastas/contam.fasta
cat /path/to/proteome.fasta /path/to/contam.fasta > proj/db/proteome_contam.combined.fasta
# template command
/path/to/runMSblender.sh /path/to/mzXML/file /path/to/database/file /path/to/working/dir/ /path/to/output/dir /path/to/logs/dir
# for many mzXMLs (e.g., CFMS data); parallel: "-j2" = 2 commands at a time, "-j4" = 4 commands at a time, etc
for x in mzXML/*mzXML; do echo "/path/to/runMSblender.sh ${x} /path/to/db/proteome_contam.combined.fasta /path/to/working/dir/ /path/to/output/dir/ /path/to/logs/dir/"; done > proj.msblender.cmds
cat proj.msblender.cmds | parallel -j4
# combine .group file for each fraction into one tab-separated output
python /path/to/msblender-scripts/msblender2elution.py \
--prot_count_files /path/to/output/dir/*.group \
--output_filename proj_output_name.prot_count_mFDRpsm001.unique.elut \
--fraction_name_from_filename \
--parse_uniprot_id --remove_zero_unique

Running the example

This repo contains an "example" folder with the recommended directory structure provided above.

# get repo
git clone https://github.com/marcottelab/MSblender.git
# switch to the example directory
cd MSblender/example/

# create directory skeleton (mzXML and db already exist)
mkdir {working,output,logs}
# make the database
cat db/caeel.fasta db/contam.fasta > db/caeel.contam.fasta
# generate commands
for x in mzXML/*mzXML; do echo "../runMSblender.sh ${x} db/caeel.contam.fasta working output logs"; done > example.msblender.cmds
# run commands in parallel ("-j2" = 2 commands at a time, "-j4" = 4 commands at a time, etc)
cat example.msblender.cmds | parallel -j4
# combine results into a table
python ../msblender-scripts/msblender2elution.py \
 --prot_count_files output/*.group \
 --output_filename example.prot_count_mFDRpsm001.unique.elut \
 --fraction_name_from_filename \
 --parse_uniprot_id --remove_zero_unique

Search parameter configuration

Search engine parameter docs: X!Tandem, Comet-2013020, and MS-GF+.

Search parameters can be modified as necessary, but try to keep parameters consistent across search algorithms.

The default were selected with our standard MS experiments in mind:

  • high-res MS (10ppm precursor tolerance)

  • low-res MS/MS (ion trap) *

  • tryptic digestion, no non-enzymatic termini

  • fixed cysteine carbamidomethylation (+57.021464, from iodoacetamide alkylation)

  • optional methionine oxidation (+15.9949)

* X!Tandem purportedly ignores fragment mass tolerance settings when using k-scoring and/or no "spectrum conditioning". (And it recommends turning conditioning off when using k-score.)

Tips on changing search parameters

Comet parameter and MS-GF+ param and modification files are found in ./params

Comments within each should offer sufficient documentation.

X!Tandem parameters are found in ./search/tmpl/tandemK.high.xml

MSBlender Docker

A full docker image with MSblender installed is available here: https://hub.docker.com/r/kdrew/msblender

To run:

docker pull kdrew/msblender

docker run -v /test_data/:/data msblender /data/xl_animalcaps_SEC_Control_20a_20181121.mzXML /data/combined_contam_rev_file.fasta /data/working /data/output /searchgui

To do list

  • Myrimatch currently not working b/c of library issue

  • Separate Xtandem from this repo

  • Add on MS1 quantification

About

MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modeling approach.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published