Skip to content
PitKubi edited this page Jul 18, 2024 · 27 revisions

mhc-validator

MHC-validator is a machine learning software to rescore immunopeptidomics data aquired with mass spectrometers. Data input must be search engine results from a search engine such as Comet. MHC-validator learns from peptide spectrum features reported by the search engine, MHC binding affinities from NetMHCpan4.1 and MHCflurry and peptide sequences to better assess whether a potential immunopeptide in your mass spectrometry run is present or not.

MHC-validator can be built into commonly used immunopeptidomics pipelines. If you implement MHC-Validator into your immunopeptidomics pipeline, you can significantly boost the number of confidentially identified immunopeptides. Depending on the sample quality, we report 1.5 up to 10 fold more peptide spectrum matches (PSMs) with mhc-validator compared to the commonly used enhancing tools (Aka percolator, DeepRescore etc.). MHC-validator does not only boost the number of immunopeptides found, it is also highly specific in finding low abundant immunopeptides in your samples.

Below is a brief high level description of how mhc-validator works:

  1. First, mhc-validator loads peptides sequences and its features. Based on these features and the knowledge whether a peptide comes from a target or decoy search, mhc-validator tries to learn how likely a peptide spectrum match is real. MHC-validator uses three types of features, a) the features reported by the search engine (target vs. decoy, mass, peptide length, charge, Xcorr etc.), b) immuinopeptide binding affinities reported by NetMHCpan4.1 and/or MHCflurry and c) peptide amino acid sequences themselves. The base algorithm is based on learning from the search engine results only (termed MV), immunopeptide binding affinity assessment (MHC) and peptide sequence encoding (PE) can be added by the user using the options available. Let's assume we intend to use MHC-validator to its full potential and set the options 'sequence_encoing' (PE), 'netmhcpan' and 'mhcflurry' (MV) all to 'True' in this example.

  2. Once the sequences have been loaded, mhc-validator first uses NetMHCpan4.1 and MHCflurry to generate MHC binding affinities/elution scores and adds the results to the feature list provided from the database search results. Based on these features, MHC-validator uses a neuronal network to learn and finally assigns possibilities for each peptide to be hit or not. This first neuronal network can (If sequence encoding is set to True as it is in our example) be connected to a second neuronal network which takes the amino acid sequences into account.

  3. Results are reported in form of a q-value. Peptides with a q-value <0.01 are identified with having less than 1% chance of being a false positive (Aka are a true hit based on a 1% FDR cutoff).

  4. You can now use these peptides for further analysis.

Setting up a pipeline with mhc-validator

Now that we have understood how mhc-validator works on a high level, let's provide some more information for setting up a pipeline in which mhc-validator is used. If you already have your own pipeline and know how to work with mass spectrometry data, it is easiest to just install mhc-validator and set it up as described in the readme file.

Let's get started:

  1. Gather raw immunopeptidomic files in the form of .raw, .mgf, .mzxml or .mzml files.
Clone this wiki locally