This tutorial explains how to adapt nf-core
pipelines to accept sample metadata in PEP format.
An example implementation can be found
in the taxprofiler
pipeline.
A pull request with all the changes needed can be found here.
The steps to accomplish that are as follows:
- Rewrite all pipeline input checks to PEP schema.
- If the script to check input does something more than input validation, then decouple the logic.
- Add
--pep
input parameter for the pipeline. - Adjust the
nextflow_schema.json
to accept--pep
parameter. - Install
eido/validate
andeido/convert
modules fromnf-core
modules. - Adjust the workflow responsible for input check.
- Create
test_pep
config so that users can run simple PEP input example.
Below is detailed explanation of these tasks as well as other information with additional resources that may be useful during implementation.
In general, nf-core
pipelines usually consist of a check_samplesheet.py
(or similarly named) Python script that is validates the
samplesheet.csv
file. This validation checks if all mandatory columns are present in the file,
if all required columns have data, if extensions of the files are correct, etc.
Here, we propose switching this approach to insetad use a PEP schema, so that the PEP validator (eido
) can be used to accomplish
all checks formerly performed by check_samplesheet.py
. Example PEP schema for taxprofiler
pipeline can be found here.
In some cases previously mentioned check_samplesheet.py
script not only was supposed to validate
the input files, but was also adding additional column with information what type of reads
given row has.
Since eido
is a tool just for validation, one can't add any column by using eido/validate
.
The best option here is to identify (within check_samplesheet.py
) the logic responsible for modification
of the input file and move it to separate Python script (bin/place_the_script_here.py
). That way one can
still remove all the logic responsible for validation and replace it with eido
, and modify the input
samplesheet.csv
using newly extracted Python script.
It will be good if all the pipelines will share a common interface, so that users can run PEP with all the
pipelines the same way. To accomplish that, the --pep
parameter should be added to the pipeline.
Developer should allow pipeline to consume --pep
parameter and make it mandatory to provide either --input
or --pep
when running a pipeline (by default user must always pass --input
). In case of taxprofiler
pipeline
two files had to be edited: lib/WorkflowMain.groovy
and workflows/taxprofiler.nf
.
This step is strongly coupled with 3. Add PEP as input parameter
. When adding new parameter to the pipeline,
one must adjust the nextflow_schema.json
to avoild validation errors. The only thing needed here is to tell
that instead of one mandatory argument (--input
), we will now have one of [--input, --pep]
as mandatory.
Eido is currently added as a module to nf-core
modules. That way it can be shared across all the pipelines.
To be able to use EIDO_VALIDATE
and EIDO_CONVERT
commands in the pipeline, the developer first must install the
modules for current pipeline. Tutorial how to do it can be found
here.
When incorporating new modules, the workflow will change. In my case changes were needed in
modules/local/samplesheet_check.nf
and subworkflows/local/input_check.nf
.
Developer should create test config so that user can run pipeline with PEP as input with minimal effort.
In order to do it, new config profile should be added as shown in taxprofiler
pull request.
In general all necessary modules (eido/validate
and eido/convert
) are already added to nf-core modules
,
but it may happen that the developer will need to add other tools. In order to do it, it's good to know how
this works for nf-core
. To be able to use any container in nf-core
pipelines they should be hosted on
biocontainers
registry. Let's say that we want to add peppy
as a tool and use it within a pipeline.
There are two ways to accomplish that:
- Put
peppy
tobioconda
. This is the easiest way, and whenpeppy
is available inbioconda
, thenbiocontainers
provide an automated container creation for this tool. - Manually add
peppy
to biocontainers. Detailed tutorial how to do it is available here.