Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update QC metrics in mzML files example #228

Merged
merged 15 commits into from
Jul 29, 2024
Merged
9 changes: 6 additions & 3 deletions docs/pages/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@ title: "mzQC Examples"
permalink: /examples/
---

Here are a number of worked examples, that, each for its own use-case, go step-by-step through the different parts of a mzQC.
The following use cases provide several hands-on examples of how mzQC files are structured and can be used:

- [Representing QC data for an individual mass spectrometry run](intro_run/)
- [Deriving QC data from multiple related mass spectrometry runs](intro_set/)
- [QC sample mzQC](QC2-sample-example/)
- [in mzML](mzml-mzqc-example/)
- [Using USI with mzQC](USI-example/)
- [Batch correction](metabo-batches/)

Additionally, for more advanced usage, mzQC can closely interoperate with several other file formats developed by the Proteomics Standards Initiative:

- [Using USI with mzQC](USI-example/)
- [Incorporating QC metrics in mzML files](adv_mzqc_in_mzml/)
76 changes: 76 additions & 0 deletions docs/pages/worked-examples/adv_mzqc_in_mzml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
layout: page
title: "Incorporating QC Metrics in mzML Files"
permalink: /examples/adv_mzqc_in_mzml/
---

While QC metrics in the PSI-MS controlled vocabulary are primarily intended for use in mzQC files, they can also be embedded directly within other file formats developed by the Proteomics Standards Initiative, such as [mzML](https://github.com/HUPO-PSI/mzML) and [mzIdentML](https://github.com/HUPO-PSI/mzIdentML) files.
This integration is particularly useful when it's preferred to store a limited set of QC metrics alongside the data they describe, thereby enhancing data integrity and accessibility.

You can view a comprehensive example of an mzML file incorporating QC metrics [here](https://github.com/HUPO-PSI/mzQC/tree/main/specification_documents/examples/adv_mzqc_in_mzml.mzml).
Below, we detail the steps and elements involved in this process.

1. **Source file specification**

Define the source of the QC metrics using a `sourceFile` element.
This specifies the mzQC file as an input file, similarly to how other input files are handled within mzML:

```
<sourceFile id="QC1" name="BSA1_F1.mzQC" location="file:///examples/">
<cvParam cvRef="MS" accession="MS:1003160" name="mzQC format" />
</sourceFile>
```

2. **Software and data processing**

Document the software and data processing steps utilized to generate the mzQC file and compute the QC metrics:

```
<software id="qc_0" version="0" >
<cvParam cvRef="MS" accession="MS:1000799" name="custom unreleased software tool" value="https://hupo-psi.github.io/mzQC/" />
</software>
```

And:

```
<dataProcessing id="dp_sp_2">
<processingMethod order="0" softwareRef="qc_0">
<cvParam cvRef="MS" accession="MS:1000543" name="data processing action" value="QC metrics calculation" />
</processingMethod>
</dataProcessing>
```

3. **Inclusion of QC metrics**

Include the QC metrics at appropriate levels within the mzML structure:

- **Run-level metrics**

Metrics that relate to all spectra in the file are embedded at the `run` level using a `cvParam`:

```
<run id="ru_0" defaultInstrumentConfigurationRef="ic_0" sampleRef="sa_0" startTimeStamp="2009-08-09T22:32:31" defaultSourceFileRef="sf_ru_0">
<cvParam cvRef="MS" accession="MS:4000063" name="MS2 known precursor charges fractions" value="{'MS:1000041': [1, 2, 3, 4], 'UO:0000191': [0.0000, 0.5721, 0.3535, 0.0743]}" />
...
</run>
```

- **Individual spectrum metrics**

For metrics that relate to individual spectra, include these metrics at the `spectrum` level using a `cvParam`:

```
<spectrum id="spectrum=1011" index="0" defaultArrayLength="467" dataProcessingRef="dp_sp_0">
...
<cvParam cvRef="MS" accession="MS:4000068" name="spectra half-TIC" value="{'MS:1000767': ['spectrum=1011'], 'UO:0000191': [0.0235]}"/>
...
</spectrum>
```

Repeat for each spectrum as necessary, adjusting the spectrum ID and corresponding values.

Note that because QC metrics in mzQC files are typically encoded at the level of runs rather than individual spectra, most spectrum-level QC metrics are defined in the PSI-MS controlled vocabulary as tabular metrics with rows for all spectra.
Therefore, when directly associating these metrics with a specific spectrum, the tables should contain a single entry only for this spectrum.

The key insight for embedding QC metrics in alternative file formats is that because they are backed by terms in the PSI-MS controlled vocabulary, they can be directly included using the respective functionalities for CV terms, such as `cvParam`.
87 changes: 0 additions & 87 deletions docs/pages/worked-examples/mzml-mzqc-example.md

This file was deleted.

Loading
Loading