Skip to content

Commit

Permalink
added shards and gather_input
Browse files Browse the repository at this point in the history
  • Loading branch information
B3rse committed Sep 25, 2023
1 parent 77e6fa2 commit 411244c
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 2 deletions.
18 changes: 18 additions & 0 deletions docs/yaml_metaworkflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ Template
dependencies:
- <workflow_name>[@<tag>]
## Fixed shards ####################
# Allows to force a fixed shards structure ignoring
# the input structure, scatter and gather dimensions
####################################
shards: [[<string>], ..] # e.g., [['0'], ['1'], ['2']]
## Lock version ####################
# Specific version to use
# for the workflow
Expand Down Expand Up @@ -96,6 +102,7 @@ Template
gather: <integer>
input_dimension: <integer>
extra_dimension: <integer>
gather_input: <integer>
# All the following fields are optional and provided as example,
# can be expanded to anything accepted by the schema
mount: <boolean>
Expand Down Expand Up @@ -188,6 +195,12 @@ dependencies
Workflows that must complete before kicking the current step.
List of workflows in the the format ``<workflow_name>[@<tag>]``.

shards
^^^^^^
Allows to force a fixed shards structure for the current step.
Override input structure, scatter and gather dimensions.
Shards structure as list, e.g., ``[['0'], ['1'], ['2']]``.

version
^^^^^^^
Version to use for the corresponding workflow instead of the default specified for the repository.
Expand Down Expand Up @@ -305,3 +318,8 @@ extra_dimension
Additional increment to dimension used when creating the specific input for the step.
This will be applied on top of ``gather``, if any, and will only affect the input.
This will not affect gather dimension in building the pipeline structure.

gather_input
------------
Equivalent to ``gather`` in collecting output from previous shards.
This will not affect scatter or gather dimensions in building pipeline structure.
19 changes: 17 additions & 2 deletions docs/yaml_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,25 @@ We recommend to use this new type to implement QCs.
When using ``quality_metric_generic`` as a ``qc_type``, it is possible to generate two different types of output: a key-value pairs JSON file and a compressed file.
The JSON file can be used to create a summary report of the quality metrics generated by the QC process.
The compressed file can be used to store the original output for the QC, including additional data or graphs.
Both the JSON file and compressed file will be attached to the file specified as target by ``argument_to_be_attached_to``.
The content of the JSON file will be patched directly on the target file, while the compressed file will be made available for download on the file via a link.
Both the JSON file and compressed file will be attached to the file specified as target by ``argument_to_be_attached_to`` with a ``QualityMetricGeneric`` object.
The content of the JSON file will be patched directly on the object, while the compressed file will be made available for download via a link.
The output type can be specified by setting ``json: True`` or ``zipped: True`` in the the QC output definition.

Template for ``quality_metric_generic``:

.. code-block:: python
}
"name": "Quality metric name",
"qc_values": [
{
"key": "Name of the key",
"tooltip": "Tooltip for the key",
"value": "Value for the key"
}
]
}
secondary_files
^^^^^^^^^^^^^^^
This field can be used for output **files**.
Expand Down
4 changes: 4 additions & 0 deletions pipeline_utils/lib/yaml_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,7 @@ class YAMLMetaWorkflow(YAMLTemplate):
OUTPUT_SCHEMA = 'output'
CONFIG_SCHEMA = 'config'
DEPENDENCIES_SCHEMA = 'dependencies'
SHARDS_SCHEMA = 'shards'
PROBAND_ONLY_SCHEMA = 'proband_only'

def __init__(self, data):
Expand Down Expand Up @@ -446,6 +447,9 @@ def _workflows(self, version, project):
# hard dependencies
if values.get(self.DEPENDENCIES_SCHEMA):
workflow_[self.DEPENDENCIES_SCHEMA] = values[self.DEPENDENCIES_SCHEMA]
# fixed shards
if values.get(self.SHARDS_SCHEMA):
workflow_[self.SHARDS_SCHEMA] = values[self.SHARDS_SCHEMA]
workflows.append(workflow_)

return workflows
Expand Down
7 changes: 7 additions & 0 deletions pipeline_utils/schemas/yaml_metaworkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@
schema.ITEMS: {
schema.TYPE: schema.STRING
}
},
'shards': {
schema.DESCRIPTION: 'Shards structure to create for the step',
schema.TYPE: schema.ARRAY
}
},
schema.REQUIRED: ['input', 'config']
Expand Down Expand Up @@ -107,6 +111,9 @@
'gather': {
schema.TYPE: schema.NUMBER
},
'gather_input': {
schema.TYPE: schema.NUMBER
},
'input_dimension': {
schema.TYPE: schema.NUMBER
},
Expand Down

0 comments on commit 411244c

Please sign in to comment.