-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce workflow run crate #32
Conversation
solved by removing |
🤔 |
todo: add an example by a workflow using FileStats objects recording format specific statistics of output files |
Now need to re-add the sapporo terms like You can do it like this: EXTRA_TERMS = {
"ParameterConnection": "https://w3id.org/ro/terms/workflow-run#ParameterConnection",
"connection": "https://w3id.org/ro/terms/workflow-run#connection",
"sourceParameter": "https://w3id.org/ro/terms/workflow-run#sourceParameter",
"targetParameter": "https://w3id.org/ro/terms/workflow-run#targetParameter"
}
crate = ROCrate()
crate.metadata.extra_terms.update(EXTRA_TERMS)
# ... |
The property is called {
"@id": "#in_file",
"@type": "FormalParameter",
"additionalType": "File",
"encodingFormat" : "http://edamontology.org/format_1929",
"name": "in_file"
} |
Looking at ro-crate-metadata-example.json: the Formal Parameter: {
"@id": "#fastq_out_1",
"@type": "FormalParameter",
"additionalType": "File",
"encodingFormat": "http://edamontology.org/format_1930"
} Actual file representation: {
"@id": "outputs/SRR1274307_1.fastq",
"@type": "File",
"exampleOfWork": {
"@id": "#fastq_out_1"
},
...
} By the way, determining the workflow's inputs and outputs does not seem straightforward here since the Sapporo engine acts, in a way, as a wrapper around the CWL engine with different input / output slots. The CWL workflow has three inputs ( |
Regarding the other custom terms:
|
Thank you very much @stain and @simleo !! Sorry for not replying earlier.
Yes, I think this is a general concern for a WES implementation that aims to generate RO-crate. With the current one that Sapporo generates for each workflow engine, it's only reusable with Sapporo to run it again. My idea is to let Sapporo use engine's feature to generate RO-crate if possible (like cwltool), but generates a generic one (the current form) for the engines without the RO-crate feature (like the other engines). It'd be hard to parse the input/output file for each engine and continue supporting all of them. Ideally, all engines should support RO-crate and a WES just uses the features. Thanks @simleo for the advice on the additional terms (thanks for finding the |
b6df700
to
a5f8ebe
Compare
@simleo Thank you for reviewing! Hiro has fixed most of the issues, but we still have some points which need to be discussed. The But I suspect that the initial one was also valid:
looks like a valid JSON-LD context. Or am I missing something in the spec? cc. @stain The log files Three The root data entity now For the I assume the other engines might have the same situation - do you have any suggestions or guidelines for this?
The main workflow file is currently missing in the crate because the request has been made by the workflow URL. But it should include the main workflow description file should be included. This requires the update of Sapporo's workflow runner, so we will work on this.
This is because the parameter is not specified in the run request sent to Sapporo. The parameter exists in the workflow definition file but is not specified in the request, so the run was made with the default value specified in the workflow definition. This is another thing that WES cannot find.
This is also another WES-specific issue. A WES can inspect the run request (workflow definition to run, job parameters to be given), but cannot inspect the workflow definition content itself (in other words, nextflow/snakemake files cannot be parsable). Therefore, a WES just can list the files generated by the run without mapping them to the defined (expected) outputs. I understand this is not ideal, but I think this is the limitation when we implement the ro-run-crate function in a WES. Thank you very much again for your huge effort in helping us! |
@inutano Regarding the "@context": [
"https://w3id.org/ro/crate/1.1/context",
"https://w3id.org/ro/terms/sapporo",
"https://w3id.org/ro/terms/workflow-run"
] Regarding the license, it's normal for a workflow runner not to know the workflow's license in the general case. Specifying a license is required by Workflow RO-Crate though, and you also need to specify an open license to upload the final example to Zenodo. In the case of Sapporo, maybe the license could be added by the user through an option. There's a similar situation in runcrate: the As for the workflow's |
I think it's almost ready. I would remove the workflow's |
Manually removed the The directory: |
I suggested to remove only the workflow's Also, this should not be a manual change: the RO-Crate is meant to give an example of the implementation, so it's Sapporo that has to be changed in order to change the RO-Crate. |
@simleo Thank you for your reviewing and explanation at the call yesterday! @suecharo fixed the code to generate the JSON file without One question, is it okay to keep the files not mentioned in the JSON, e.g. |
@inutano @suecharo the changes in 0b7e990 also moved the Regarding
The crate directory can contain files that are not mentioned in the metadata file. |
@simleo Thanks for the comments!
My apologies, I had to explain/ask about this. Another WES issue here is that a WES does not know which files are workflow dependencies and which are input parameters, as it only can see that the workflow engine downloaded the files into the working directory. In this case, it may look obvious from the file extensions that the
This makes sense. This is just inherited from our WES implementation and the internal namespace usage went out for the ro-crate. We will discuss if we can change it.
Cool, thanks! |
If the distinction is unknown to Sapporo, I guess adding them to
If it's complicated, leave that be. Better not delay the PR merge and example publication. |
Okay, I'll add it to the manuscript. for the |
This pytest error will be fixed in the next release. |
@suecharo @inutano congrats on merging the PR! I've noticed the crate at |
Some missing features:
mentions
in the top-level object to link to a runinput
andoutput
in the workflow objectConcerns:
a property can be an object or an array of objects depending on the number of appended objectsCreativeWork
? -> a log file issubjectOf
a workflow run