You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reaching out to share some thoughts that emerged from a recent discussion within the Japan community (@inutano, @tom-tan) regarding the development of JSON schemas for workflow parameter files.
Specifically, we're looking at creating JSON schemas that corresponds to the YAML templates generated by cwltool --make-template.
While the templates created by cwltool --make-template are incredibly useful, I believe that a JSON schema would be more suitable for generating forms for expected workflow inputs and representing workflow parameters in Workflow Execution Services (WES). (Ref.: nf-core - rnaseq - schema_input.json)
To address this, I have drafted a preliminary Python function snippet:
fromjsonimportdumpsfromtypingimportAnyfromcwl_utils.parserimportload_document_by_uri, savedefparse_inputs(cwl_url: str) ->Any:
cwl_obj=load_document_by_uri(cwl_url)
saved_obj=save(cwl_obj)
if"inputs"notinsaved_obj:
raiseValueError("Inputs are missing in the provided object.")
returnsaved_obj["inputs"]
definputs_to_jsonschema(inputs: Any) ->Any:
""" Converts a CWL inputs object into a jsonschema object. Args: inputs: CWL inputs object. Returns: A jsonschema object. """schema= {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {},
"required": [],
"additionalProperties": False,
}
# Refer to https://www.commonwl.org/v1.2/Workflow.html#WorkflowInputParameter for more detailsforinput_itemininputs:
input_id=input_item.get("id")
input_type=input_item.get("type")
ifinput_idisNoneorinput_typeisNone:
raiseValueError(
"Each item in the 'inputs' object must include 'id' and 'type' fields.")
property_schema=_input_type_to_property_schema(input_type)
if"secondaryFiles"ininput_item:
# TODO: do nothing?# secondaryFiles does not seem to affect the --make-template# For example, refer to $ cwltool --make-template https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/stage-array.cwlpassif"default"ininput_item:
property_schema["default"] =input_item["default"]
schema["properties"][input_id] =property_schema# type: ignoreif"default"notininput_itemand"null"notininput_type:
schema["required"].append(input_id)
returnschemadef_input_type_to_property_schema(input_type: Any) ->Any:
ifisinstance(input_type, dict):
nested_type=input_type.get("type")
ifnested_typeisNone:
raiseValueError("The 'inputs.[].type' nested type object must contain a 'type' field.")
ifnested_type=="enum":
enum=input_type.get("symbols")
ifenumisNone:
raiseValueError("The 'inputs.[].type' nested type object must contain a 'symbols' field.")
return {
"type": "string",
"enum": enum,
}
elifnested_type=="record":
schema= {
"type": "object",
"properties": {},
"required": [],
"additionalProperties": False,
}
fields=input_type.get("fields")
iffieldsisNone:
raiseValueError("The 'inputs.[].type' nested type object must contain a 'fields' field.")
forfieldinfields:
field_name=field.get("name")
field_type=field.get("type")
iffield_nameisNoneorfield_typeisNone:
raiseValueError("The 'inputs.[].type.[].fields' object must contain 'name' and 'type' fields.")
field_id=field_name.split("#")[-1].split("/")[-1]
schema["properties"][field_id] =_input_type_to_property_schema(field_type) # type: ignoreif"default"notinfield:
schema["required"].append(field_id)
returnschemaelifnested_type=="array":
item_type=input_type.get("items")
ifitem_typeisNone:
raiseValueError("If 'inputs.[].type.type' is 'array', 'inputs.[].type' must contain an 'items' field.")
return {
"type": "array",
"items": _input_type_to_property_schema(item_type),
"additionalItems": False
}
else:
raiseValueError(f"Unexpected type encountered: {input_type}.")
elifisinstance(input_type, list):
iflen(input_type) !=2or"null"notininput_type:
raiseValueError(f"Unexpected type encountered: {input_type}.")
original_type= [tfortininput_typeift!="null"][0]
schema=_input_type_to_property_schema(original_type)
schema["nullable"] =Truereturnschemaelse:
ifinput_type=="File":
return {
"type": "object",
"properties": {
"class": {"type": "string", "const": "File"},
"path": {"type": "string"},
"location": {"type": "string"}
},
"required": ["class"],
"oneOf": [
{"required": ["path"]},
{"required": ["location"]}
],
"additionalProperties": False,
}
elifinput_type=="Directory":
return {
"type": "object",
"properties": {
"class": {"type": "string", "const": "Directory"},
"path": {"type": "string"},
"location": {"type": "string"}
},
"required": ["class"],
"oneOf": [
{"required": ["path"]},
{"required": ["location"]}
],
"additionalProperties": False,
}
elifinput_type=="Any":
return {
"anyOf": [
{"type": "boolean"},
{"type": "integer"},
{"type": "number"},
{"type": "string"},
{"type": "array"},
{"type": "object"}
]
}
elifinput_type=="null":
return {"type": "null"}
else:
ifinput_typein ["long", "float", "double"]:
return {"type": "number"}
elifinput_type=="int":
return {"type": "integer"}
else:
return {"type": input_type}
defvalidate_jsonschema_itself(jsonschema: Any) ->None:
fromjsonschema.validatorsimportvalidator_forvalidator=validator_for(jsonschema)
validator.check_schema(jsonschema)
defmain() ->None:
test_urls= [
# Sapporo example workflow."https://raw.githubusercontent.com/sapporo-wes/sapporo-service/main/tests/resources/cwltool/trimming_and_qc.cwl",
# When the definition itself is a nasty case."https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/echo-tool-packed.cwl",
"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/revsort-packed.cwl",
# When the type is nasty."https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/anon_enum_inside_array.cwl",
# The number of parameters is a little large, and the definition itself is a straightforward case."https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/bwa-mem-tool.cwl",
# The case where CommandInputParameter is shortened (e.g., param: string)"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/env-tool1.cwl",
# No input parameters"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/envvar3.cwl",
# Any"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/params.cwl",
# Dir"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/dir.cwl",
# SecondaryFiles"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/secondaryfiles/rename-inputs.cwl",
"https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/stage-array.cwl",
]
forurlintest_urls:
try:
print(f"{'-'*3} Test URL: {url}{'-'*10}")
print("\n")
inputs=parse_inputs(url)
print("Inputs object: \n")
print(dumps(inputs, indent=2))
print("\n")
print("JSON Schema: \n")
jsonschema=inputs_to_jsonschema(inputs)
validate_jsonschema_itself(jsonschema)
print(dumps(jsonschema, indent=2))
print("\n")
exceptExceptionase:
print(f"Failed to parse: {url}")
print(e)
importtracebacktraceback.print_exc()
if__name__=="__main__":
main()
I am aware that there may be deficiencies, such as a lack of comprehensive test cases. Therefore, I am eager to receive feedback on this implementation approach and any other suggestions you may have.
The text was updated successfully, but these errors were encountered:
A first comment is: why are you reverting a typed CWL object into an untyped Python dictionary using save? Wouldn't it be safer and more reliable to rely directly on CWL Python objects?
I somewhat understand that the inputs object is a cwl.InputParameter. However, considering the load_document_by_uri interface, which is defined as follows:
It returns Any, which makes me wonder if it's difficult to implement something using a safer and more reliable type. I know I could use casting, but that seems counterproductive.
If there's a better approach, I would appreciate your guidance.
Hi CWL community,
I'm reaching out to share some thoughts that emerged from a recent discussion within the Japan community (@inutano, @tom-tan) regarding the development of JSON schemas for workflow parameter files.
Specifically, we're looking at creating JSON schemas that corresponds to the YAML templates generated by
cwltool --make-template
.While the templates created by
cwltool --make-template
are incredibly useful, I believe that a JSON schema would be more suitable for generating forms for expected workflow inputs and representing workflow parameters in Workflow Execution Services (WES). (Ref.: nf-core - rnaseq - schema_input.json)To address this, I have drafted a preliminary Python function snippet:
This function is capable of generating a JSON schema like the following example (https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/tests/bwa-mem-tool.cwl):
I am aware that there may be deficiencies, such as a lack of comprehensive test cases. Therefore, I am eager to receive feedback on this implementation approach and any other suggestions you may have.
The text was updated successfully, but these errors were encountered: