Skip to content

Commit

Permalink
updated parser to support generic QC
Browse files Browse the repository at this point in the history
  • Loading branch information
Michele Berselli authored and Michele Berselli committed May 9, 2023
1 parent 3511e64 commit 77e6fa2
Show file tree
Hide file tree
Showing 5 changed files with 98 additions and 66 deletions.
8 changes: 8 additions & 0 deletions LOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Version Updates

#### v2.1.0
* Added support for updated QCs, to enable the new generic schema ``quality_metric_generic``


#### v2.0.0
* Initial release after major changes to support the new YAML format for portal objects
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
author = 'Michele Berselli, CGAP & SMaHT Team'

# The full version, including alpha/beta/rc tags
release = '2.0.0'
release = '2.1.0'


# -- General configuration ---------------------------------------------------
Expand Down
15 changes: 12 additions & 3 deletions docs/yaml_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ Template
argument_type: qc.<type> # qc_type, e.g. quality_metric_vcfcheck
# none can be used as <type>
# if a qc_type is not defined
# quality_metric_generic can be used as <type>
# to use the general qc_type instead of a custom one
argument_to_be_attached_to: <file_output_name>
# All the following fields are optional and provided as example,
# can be expanded to anything accepted by the schema
Expand Down Expand Up @@ -164,11 +166,18 @@ Definition of the type of the output.
For a **file** output, the argument type is defined as ``file.<format>``, where ``<format>`` is the format used by the file.
``<format>`` needs to match a file format that has been previously defined, see :ref:`File Format <file_format>`.

For a **QC** (Quality Control) output, the argument type is defined as ``qc.<type>``, where ``<type>`` is a a ``qc_type`` defined in the the schema, see `schemas <https://github.com/dbmi-bgm/cgap-portal/tree/master/src/encoded/schemas>`__.

For a **report** output, the argument type is defined as ``report.<type>``, where ``<type>`` is the type of the report (e.g., file).

*Note*: We are currently re-thinking how QC and report outputs work, the current definitions are temporary solutions that may change soon.
For a **QC** (Quality Control) output, the argument type is defined as ``qc.<type>``, where ``<type>`` is a ``qc_type`` defined in the schema, see `schemas <https://github.com/dbmi-bgm/cgap-portal/tree/master/src/encoded/schemas>`__.
While custom ``qc_type`` schemas are still supported for compatibility, we introduced a new generic type ``quality_metric_generic``.
We recommend to use this new type to implement QCs.

When using ``quality_metric_generic`` as a ``qc_type``, it is possible to generate two different types of output: a key-value pairs JSON file and a compressed file.
The JSON file can be used to create a summary report of the quality metrics generated by the QC process.
The compressed file can be used to store the original output for the QC, including additional data or graphs.
Both the JSON file and compressed file will be attached to the file specified as target by ``argument_to_be_attached_to``.
The content of the JSON file will be patched directly on the target file, while the compressed file will be made available for download on the file via a link.
The output type can be specified by setting ``json: True`` or ``zipped: True`` in the the QC output definition.

secondary_files
^^^^^^^^^^^^^^^
Expand Down
21 changes: 17 additions & 4 deletions pipeline_utils/lib/yaml_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,8 +178,10 @@ class YAMLWorkflow(YAMLTemplate):
INPUT_FILE_SCHEMA = 'Input file'
OUTPUT_PROCESSED_FILE_SCHEMA = 'Output processed file'
OUTPUT_QC_FILE_SCHEMA = 'Output QC file'
GENERIC_QC_FILE_SCHEMA = 'Generic QC file'
OUTPUT_REPORT_FILE_SCHEMA = 'Output report file'
QC_SCHEMA = 'qc'
QUALITY_METRIC_GENERIC_SCHEMA = 'quality_metric_generic'
REPORT_SCHEMA = 'report'
ARGUMENT_TO_BE_ATTACHED_TO_SCHEMA = 'argument_to_be_attached_to'
ZIPPED_SCHEMA = 'zipped'
Expand Down Expand Up @@ -253,7 +255,12 @@ def _arguments_output(self):
self.SECONDARY_FILE_FORMATS_SCHEMA: values.get(self.SECONDARY_FILES_SCHEMA, [])
}
elif type == self.QC_SCHEMA:
argument_type = self.OUTPUT_QC_FILE_SCHEMA
# handle generic vs specific QC schema
if format == self.QUALITY_METRIC_GENERIC_SCHEMA:
argument_type = self.GENERIC_QC_FILE_SCHEMA
else:
argument_type = self.OUTPUT_QC_FILE_SCHEMA
# create base QC argument
argument_ = {
self.ARGUMENT_TYPE_SCHEMA: argument_type,
self.WORKFLOW_ARGUMENT_NAME_SCHEMA: name,
Expand All @@ -263,9 +270,15 @@ def _arguments_output(self):
self.QC_JSON_SCHEMA: values.get(self.JSON_SCHEMA, False),
self.QC_TABLE_SCHEMA: values.get(self.TABLE_SCHEMA, False)
}
# handle edge case for missing QC type
if format not in ['none']:
# handle edge case for missing or generic QC type
if format not in ['none', self.QUALITY_METRIC_GENERIC_SCHEMA]:
argument_[self.QC_TYPE_SCHEMA] = format
# create argument format for generic QCs (JSON or ZIP)
elif format == self.QUALITY_METRIC_GENERIC_SCHEMA:
if argument_[self.QC_JSON_SCHEMA]:
argument_[self.ARGUMENT_FORMAT_SCHEMA] = 'json'
else:
argument_[self.ARGUMENT_FORMAT_SCHEMA] = 'zip'
# quality controls, TODO
# these fields are bad, need to rework how QCs work
if values.get(self.HTML_IN_ZIPPED_SCHEMA):
Expand Down Expand Up @@ -371,7 +384,7 @@ def _arguments(self, input, project):
self.ARGUMENT_TYPE_SCHEMA: type
}
if type == self.PARAMETER_SCHEMA:
argument_.setdefault(self.VALUE_TYPE_SCHEMA, format)
argument_[self.VALUE_TYPE_SCHEMA] = format
for k, v in values.items():
if k != self.ARGUMENT_TYPE_SCHEMA:
# handle files specifications, TODO
Expand Down
Loading

0 comments on commit 77e6fa2

Please sign in to comment.