Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial WfExS-backend examples based on toy workflow. #53

Merged
merged 20 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
357ed0d
Added initial WfExS-backend examples based on toy workflow.
jmfernandez Mar 16, 2023
c23aaf5
Added WOMBAT-pipelines provenance
jmfernandez Aug 29, 2023
f6b70ad
Update RO-Crate metadata
jmfernandez Aug 31, 2023
d670ea2
Input parameter which is telling an output directory should never tel…
jmfernandez Aug 31, 2023
b39b803
Added nf-core/rnaseq and Wetlab2Variations RO-Crate examples
jmfernandez Aug 31, 2023
0ea7962
Updated COSIFER toy workflows in cwl and Nextflow RO-Crate examples.
jmfernandez Aug 31, 2023
4510e84
Updated cosifer examples using latest code
jmfernandez Sep 12, 2023
819e95b
Additional update as I forgot to explicitly add licence and "operator…
jmfernandez Sep 12, 2023
a85de6f
Updated WOMBAT-Pipelines example
jmfernandez Sep 12, 2023
09267b0
Updated nf-core/rnaseq example
jmfernandez Sep 12, 2023
e9602b0
Updated Wetlab2Variations example
jmfernandez Sep 12, 2023
cac9c5f
Updated examples, so containers and container engines are better desc…
jmfernandez Sep 13, 2023
37f33e4
Updated COSIFER based WfExS-backend examples
jmfernandez Sep 19, 2023
fdf4a18
Updated examples from complex workflows
jmfernandez Sep 19, 2023
f49d597
Minor fixes in conformsTo sentences
jmfernandez Sep 19, 2023
c9d2ddf
Updated examples after the next bunch of updates in WfExS code.
jmfernandez Sep 25, 2023
fdeb6fd
Updated COSIFER examples using latest code (which uses ContainerImage)
jmfernandez Oct 9, 2023
577ab30
Updated Wetlab2Variations example, so its RO-Crate has new ContainerI…
jmfernandez Oct 9, 2023
547a5f1
Updated WOMBAT-P example, with ContainerImage declarations
jmfernandez Oct 9, 2023
f185e2a
Updated example of failed nf-core/rnaseq failed execution
jmfernandez Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions docs/examples/WfExS-backend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# WfExS-backend examples

These RO-Crate Workflow Run examples were generated running WfExS-backend.

There can be two RO-Crates for each execution, as one of them only contains
details gathered after staging the execution scenario but *before* the
execution, and the other one contains those details and all the provenance
gathered through and *after* the execution.

As WfExS-backend can run staged workflows more than once, all the different
executions are represented inside the generated RO-Crates as several
`CreateAction`. Also, CWL workflows are packed before they are run, so their
RO-Crates contain an additional CreateAction explaining the pack process from
the original workflow to the packed one.

Workflow execution error messages are also included. Also, a graphical representation
of the executed workflow is included. Due the nature of the different workflow
engines, this representation could be pre-computed before the execution
or after it. So, it is provided a copy of the representation for each execution.


## WOMBAT-Pipelines

* Nextflow workflow is available at https://github.com/wombat-p/WOMBAT-Pipelines

* Provenance RO-Crate from an execution using Docker containers. It includes a snapshot of the workflow: [wombat-pipelines_provenance](wombat-pipelines_provenance).


```bash
# Example of command line to generate this RO-Crate
python WfExS-backend.py -L workflow_examples/montblanc_config_gocryptfs.yaml staged-workdir create-prov-crate 047b6dfc-3547-4e09-92f8-df7143038ff4 /tmp/wombat-pipelines_provenance.zip --workflow --orcid 0000-0002-4806-5140 --licence https://spdx.org/licenses/CC-BY-4.0.html
```

## Wetlab2Variations (CWL flavor).

* CWL workflow is available at https://github.com/inab/Wetlab2Variations/blob/eosc-life/cwl-workflows/workflows/workflow.cwl

* Provenance RO-Crate from an execution using Singularity containers. It includes a snapshot of the consolidated workflow: [Wetlab2Variations_CWL_provenance](Wetlab2Variations_CWL_provenance)


```bash
# Example of command line to generate this RO-Crate
python WfExS-backend.py -L workflow_examples/local_config_gocryptfs.yaml staged-workdir create-prov-crate a37fee9e-4288-4a9e-b493-993a867207d0 /tmp/Wetlab2Variations_CWL_provenance.zip --orcid 0000-0002-4806-5140 --licence https://spdx.org/licenses/CC-BY-4.0.html
```

## nf-core RNASeq

* Nextflow workflow is available at https://github.com/nf-core/rnaseq/

* Provenance RO-Crate from an execution using Singularity containers. It includes a snapshot of the consolidated workflow: [nfcore-rnaseq_provenance](nfcore-rnaseq_provenance).


```bash
# Example of command line to generate this RO-Crate
python WfExS-backend.py -L workflow_examples/bsclife002_config_docker_gocryptfs_20.yaml staged-workdir create-prov-crate 'sex-linked aortectasis' /tmp/example_nfcore_rnaseq_1.zip --orcid 0000-0002-4806-5140 --licence https://spdx.org/licenses/CC-BY-4.0.html
```

## COSIFER cwl workflow (using singularity)

* Generated RO-Crates contain a copy of the inputs, outputs and workflow.

* WfExS configuration file: [local_config_gocryptfs.yaml](https://github.com/inab/WfExS-backend/blob/b058b538f3334a4b8c657a541dc9b9fb40434f55/workflow_examples/local_config_gocryptfs.yaml)

* Stage description: [cosifer_test1_cwl_implicit_outputs_github.wfex.stage](https://github.com/inab/WfExS-backend/blob/b058b538f3334a4b8c657a541dc9b9fb40434f55/workflow_examples/ipc/cosifer_test1_cwl_implicit_outputs_github.wfex.stage)

* Staged RO-Crate: [cosifer-cwl_staged](cosifer-cwl_staged)

* Provenance RO-Crate: [cosifer-cwl_provenance](cosifer-cwl_provenance)


```bash
# Example of command line to generate this RO-Crate
python WfExS-backend.py -L workflow_examples/local_config_gocryptfs.yaml staged-workdir create-prov-crate 2400c32e-f875-4cd4-9d41-be6da8224c67 /tmp/cosifer-cwl_provenance.zip --inputs --outputs --workflow --orcid 0000-0002-4806-5140 --orcid 0000-0003-4929-1219 --licence https://spdx.org/licenses/CC-BY-4.0.html
```

## COSIFER Nextflow workflow (using singularity)

* Generated RO-Crates contain a copy of the inputs, outputs and workflow.

* WfExS configuration file: [local_config_gocryptfs.yaml](https://github.com/inab/WfExS-backend/blob/b058b538f3334a4b8c657a541dc9b9fb40434f55/workflow_examples/local_config_gocryptfs.yaml)

* Stage description: [cosifer_test1_nxf.wfex.stage](https://github.com/inab/WfExS-backend/blob/b058b538f3334a4b8c657a541dc9b9fb40434f55/workflow_examples/ipc/cosifer_test1_nxf.wfex.stage)

* Staged RO-Crate: [cosifer-nxf_staged](cosifer-nxf_staged)

* Provenance RO-Crate: [cosifer-nxf_provenance](cosifer-nxf_provenance)


```bash
# Example of command line to generate this RO-Crate
python WfExS-backend.py -L workflow_examples/local_config_gocryptfs.yaml staged-workdir create-prov-crate 597708f2-952e-47c7-9b86-dbe3a9e5f651 /tmp/cosifer-nxf_provenance.zip --inputs --outputs --workflow --orcid 0000-0002-4806-5140 --orcid 0000-0003-4929-1219 --licence https://spdx.org/licenses/CC-BY-4.0.html
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Notes about this generated RO-Crate

RO-Crate from staged WfExS working directory a37fee9e-4288-4a9e-b493-993a867207d0 (meer oxometalate)

This RO-Crate has been generated by WfExS-backend 0.9.3-192-g93901ff (93901ff9f5e88903b917d62cd0302454ae52364c, branch main) ,
whose sources are available at https://github.com/inab/WfExS-backend.

## Software containers and metadata

Metadata files which are produced and consumed by WfExS-backend in
order to properly detect when a local cached copy of a software container
is stale are also included in this RO-Crate. These files are in JSON format.

In case this RO-Crate also contains a copy of the software containers,
their format will depend on whether they are going to be consumed by
Singularity / Apptainer, or they are going to be consumed by Docker or Podman.

Singularity / Apptainer images usually have the singularity image format.

Both Docker and Podman images are compressed tar archives obtained through
either `docker save` or `podman save` commands. These archives have all
the layers needed to restore the container image in a local registry
through either `docker load` or `podman load`.

## Posibly used URI schemes

As WfExS-backend is able to manage several exotic CURIEs and schemes,
you can find here an almost complete list of the possible ones:

* `pride.project`: 'pride' datasets metadata is fetched using the APIs described at https://www.ebi.ac.uk/pride/ws/archive/v2/swagger-ui.html#/projects . Contents are downloaded delegating their associated URIs to other fetchers

* `drs`: GA4GH DRS datasets metadata is fetched using the APIs described at https://ga4gh.github.io/data-repository-service-schemas/. Contents are downloaded delegating their associated URIs to other fetchers

* `wfexs.trs.files`: WfExS internal pseudo-scheme used to materialize files from pure TRS servers

* `trs`: GA4GH TRS metadata is fetched using the APIs described at https://ga4gh.github.io/tool-registry-service-schemas/. Contents are downloaded delegating their associated URIs to other fetchers

* `s3`: Amazon S3 resource path scheme, whose downloads are delegated on libraries implementing its support

* `gs`: Google Cloud Storage resource path scheme, whose downloads are delegated on Google Cloud Storage libraries

* `fasp`: This pseudo-scheme, which mimics ssh scheme, represents datasets behind IBM Aspera servers (quite common in life sciences infrastructures), which follow FASP protocol. Materialization of these datasets are delegated to ascp command line

* `doi`: DOIs resolve to web sites. A subset of the different DOI providers also point to datasets, like the ones from Zenodo, B2SHARE or osf.io. Fetcher implementing DOI support either delegates on other specialized fetchers or delegates the download of the resolved URL.

* `zenodo`: CURIEs following this scheme can be translated to a downloadable dataset, using APIs described at https://developers.zenodo.org/

* `b2share`: CURIEs following this scheme can be translated to a downloadable dataset, using APIs described at https://eudat.eu/services/userdoc/b2share-http-rest-api#get-specific-record

* `osf.io`: CURIEs following this scheme can be translated to a downloadable dataset, using APIs described at https://developer.osf.io/

* `git`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `git+file`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `git+https`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `git+http`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `git+ssh`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `github`: 'git' scheme and pseudo-schemes 'git+file', 'git+https', 'git+ssh' (based on https://pip.pypa.io/en/stable/topics/vcs-support/) and 'github' are managed by using git command line, applying minimal transformations in the URI.

* `swh`: Permanent identifiers of files, directories and repos at SoftwareHeritage. These URIs follow what it is described at https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html

* `http`: HTTP download URLs

* `https`: HTTPS download URLs

* `ftp`: File Transfer Protocol (see https://www.iana.org/assignments/ftp-commands-extensions/ftp-commands-extensions.xhtml)

* `sftp`: 'sftp' scheme represents contents behind an SSH server

* `ssh`: 'ssh' scheme represents contents behind an SSH server

* `file`: 'file' scheme is used to represent local files and directories. It should be only used either for development or for very isolated environments where paths are stable.

* `data`: 'data' scheme is used to embed very small payloads, as it is described at https://datatracker.ietf.org/doc/html/rfc2397

Loading