Objects and interfaces to support reading and writing of PSI/HUPO standard formats.
mzIdentML can be read and written.
mzML can be read, writing is low priority and hasn't been implemented yet in a way that minimizes memory usage.
NuGet package PSI_Interface
The readers support reading gzipped mzid and mzML files without needing to extract them
Use var results = PSI_Interface.IdentData.SimpleMZIdentMLReader.Read(filePath);
Use PSI_Interface.IdentData.mzIdentML.MzIdentMlReaderWriter.Read(filePath)
For easier interaction with the data, use the following example line:
var identData = new PSI_Interface.IdentData.IdentDataObj(MzIdentMlReaderWriter.Read(filePath));
The hardest path: Populate PSI_Interface.IdentData.mzIdentML.MzIdentMLType
, and write with PSI_Interface.IdentData.mzIdentML.MzIdentMlReaderWriter.Write(mzIdentMLType, filePath)
A slightly easier path: Populate PSI_Interface.IdentData.IdentDataObj
, and write with PSI_Interface.IdentData.mzIdentML.MzIdentMlReaderWriter.Write(new MzIdentMLType(identDataObj), filePath)
Easiest path: Use PSI_Interface.IdentData.IdentDataCreator
, calling in relative order the following functions, and providing extra data as available/specified:
var creator = new PSI_Interface.IdentData.IdentDataCreator("id", "name");
var software = creator.AddAnalysisSoftware("Software_ID", "Software_Name", "Software_Version", CV.CVID.MS_Software_Name /*if available, or CV.CVID.CVID_Unknown*/, "Software_Name_for_param");
var settings = creator.AddAnalysisSettings(software, "Settings_ID", CV.CVID.MS_ms_ms_search);
var searchDb = creator.AddSearchDatabase("Database_name", "Number_of_entries_in_database", "Database_location", CV.CVID.CVID_Unknown /*or published database if in CV*/, CV.CVID.MS_FASTA_format /*or other database format*/);
settings.AdditionalSearchParams.Items.Add(new CVParamObj(CV.CVID.MS_parent_mass_type_mono /*or other CV term*/); // Add all that are search parameters
settings.AdditionalSearchParams.Items.Add(new UserParamObj("name_of_parameter", "value_of_parameter"); // Add all other search parameters
var mod = new SearchModificationObj(); // set FixedMod, MassDelta, Residues, and add CVParamObjs to CVParams
settings.ModificationParams.Add(mod); // Repeat with a new SearchModificationObj() for each modification in the search
settings.Enzymes.Enzymes.Add(new EnzymeObj() /* populate data */): // Repeat if there are multiple enzymes, exclude if there are none
settings.ParentTolerances.AddRange(new CVParamObj[]
{
new CVParamObj(CV.CVID.MS_search_tolerance_plus_value, "tolerance value") { UnitCvid = CV.CVID.UO_parts_per_million },
new CVParamObj(CV.CVID.MS_search_tolerance_minus_value, "tolerance value") { UnitCvid = CV.CVID.UO_parts_per_million },
});
settings.FragmentTolerances.AddRange(new CVParamObj[]
{
new CVParamObj(CV.CVID.MS_search_tolerance_plus_value, "tolerance value") { UnitCvid = CV.CVID.UO_parts_per_million },
new CVParamObj(CV.CVID.MS_search_tolerance_minus_value, "tolerance value") { UnitCvid = CV.CVID.UO_parts_per_million },
});
settings.Threshold.Items.Add(new CVParamObj(CV.CVID.MS_no_threshold)); // Or choose an appropriate CV term and value, if a threshold is used
var specData = creator.AddSpectraData("path_to_spectrum_file", "Name_of_dataset", CV.CVID.MS_Thermo_nativeID_format /*or other approriate format type */, CV.CVID.MS_Thermo_RAW_format /*Whatever format the file actually is*/);
foreach (var result in searchResults)
{
var creator.AddSpectrumIdentification(specData, "spectrum_native_id", "spectrum_elution_time", experimentalMz, charge);
// Add all of the necessary information to the identification
var pep = new PeptideObj(match.Sequence); /* add ModificationObj() to Modifications for each modification in the peptide
specIdent.Peptide = pep;
var dbSeq = new DbSequenceObj(searchDb, proteinLength, "proteinName", "proteinDescription");
var pepEv = new PeptideEvidenceObj(dbSeq, pep, peptide_start_location, peptide_end_location, "prefix_residue", "suffix_residue", false /* use 'true' if is a decoy hit */);
specIdent.AddPeptideEvidence(pepEv); // repeat with a new PeptideEvidenceObj() for every distinct peptide/protein/location match.
specIdent.CVParams.Add(new CVParamObj() { Cvid = CV.CVID.MS_chemical_compound_formula, Value = match.Composition, }); // Repeat with different CV term and value for each score item
}
// Tie all of the information together
var identData = creator.GetIdentData();
// Write out to file
MzIdentMlReaderWriter.Write(new MzIdentMLType(identData), outputFilePath);
Use PSI_Interface.MSData.SimpleMzMLReader
to read an mzML file.
Several runtime options exist:
- Using random access reading (i.e., allowing non-sequential reading of spectra; default false; setting to true on a gzipped file means it will be extracted to a temp directory for reading)
- Using reduced memory (i.e., not reading the entire file into memory; default true)
- Reading spectra without reading the binary data (i.e., all the metadata, but not peak m/zs or intensities)
- Reading spectra with reading the binary data (i.e., all the metadata, with peak m/zs and intensities)
Written by Bryson Gibbons for the Department of Energy (PNNL, Richland, WA)
Copyright 2017, Battelle Memorial Institute. All Rights Reserved.
E-mail: proteomics@pnnl.gov
Website: https://github.com/PNNL-Comp-Mass-Spec/ or https://www.pnnl.gov/integrative-omics
The PSI Interface is licensed under the 2-Clause BSD License; you may not use this file except in compliance with the License. You may obtain a copy of the License at https://opensource.org/licenses/BSD-2-Clause
Copyright 2018 Battelle Memorial Institute