This document describes the output produced by the pipeline. The directories described below will be created in the output directory after the pipeline has finished. All paths are relative to the top-level output directory.
The pipeline output is saved step-by-step in the output directory as each step is completed. Below, we provide a description of the output folders corresponding to the main steps, as well as the pipeline_info
folder, which contains details about the submitted job.
- Directory Structure
- Pipeline Information: pipeline_info
- Normalization Step: splitmutiallelics
- Vep Step: ensemblvep
- Exomiser Step: exomiser/results
- Other Steps
The output directory structure is as follow:
|_ pipeline_info/
|_ splitmultiallelics/
|_ ensemblvep/
|_ exomiser/results/
...
The pipeline_info
subdirectory contains details about the pipeline execution and metadata relevant to reproducibility, performance optimization and troubleshooting.
The splitmultiallelics
subdirectory contains the output of the pipeline after completing the normalization step, just before running the vep or exomiser tools.
The ensemblvep
subdirectory contains the output after running vep and will appear only if vep is specified in the tools
parameters.
The exomiser/results
subdirectory contains the output after running exomiser and will appear only if exomiser is specified in the tools
parameters.
Here we describe in more details the content of the pipeline_info
subdirectory. It should contain the following:
|_ pipeline_info
|_ configs
|_ nextflow.config
...
|_ execution_report_2024-12-09_12-03-20.html
|_ execution_timeline_2024-12-09_12-03-20.html
|_ execution_trace_2024-12-09_12-03-20.txt
|_ params_2024-12-09_12-03-23.json
|_ pipeline_dag_2024-12-09_12-03-20.html
|_ metadata.txt
|_ nextflow.log
The timestamps that appear in some files are in the user's timezone.
The configs
folder contains copies of configuration files used. This includes the default nextflow.config
file as well as any additional configuration files passed as parameters.
The files prefixed by execution_
are reports automatically generated by nextflow. These reports allow you to troubleshoot errors with the pipeline execution and provide inofrmation such as launch commands, run times and resource usage. You can refer to the nextflow documentation for more details about these reports.
The file prefixed by params
contains the parameters used by the pipeline.
The file prefixed by pipeline_dag
contains a diagram of the pipeline steps.
The metadata.txt
file contains various information relevant for reproducibility, such as the original command line, the name of the branch / revision used, the username associated to the command, a list of configuration files passed, the nextflow work directory, etc.
The nextflow.log
file is a copy the nextflow log file. Note that it will miss logs written after the workflow.onComplete
handler is run.
The splitmultiallelics
subdirectory contains the output of the pipeline after the normalization step, just before running vep and exomiser.
|_ splitmultiallelics/
|_ family1.splitted.vcf.gz
|_ family1.splitted.vcf.gz.tbi
...
It contains one pair of vcf.gz
, vcf.gz.tbi
files per family. Specifically, we use the following naming scheme:
<FAMILY_ID>.splitted.vcf.gz
<FAMILY_ID>.splitted.vcf.gz.tbi
The family ID should match the family ID in the input sample sheet.
The ensemblvep
subdirectory contains the output of the pipeline after the vep step, if vep was specified in the tools
parameter.
|_ ensemblvep/
|_ variants.family1.vep.vcf.gz
|_ variants.family1.vep.vcf.gz.tbi
...
It contains one pair of vcf.gz
, vcf.gz.tbi
files per family. Specifically, we use the following naming scheme:
variants.<FAMILY_ID>.vep.vcf.gz
variants.<FAMILY_ID>.vep.vcf.gz.tbi
The family ID should match the family ID in the input sample sheet.
The exomiser/results
subdirectory contains the output fo the pipeline after the exomiser step, if exomiser was specified in the tools
parameter.
|_ exomiser/results
|_ family1.exomiser.genes.tsv
|_ family1.exomiser.html
|_ family1.exomiser.json
|_ family1.exomiser.variants.tsv
|_ family1.exomiser.vcf.gz
|_ family1.exomiser.vcf.gz.tbi
...
It should contains a set of 6 files per family. Specifically, we use the following naming scheme:
<FAMILY_ID>.exomiser.genes.tsv
<FAMILY_ID>.exomiser.html
<FAMILY_ID>.exomiser.json
<FAMILY_ID>.exomiser.variants.tsv
<FAMILY_ID>.exomiser.vcf.gz
<FAMILY_ID>.exomiser.vcf.gz.tbi
The family ID should match the family ID in the input sample sheet.
For more details about the content of each of these files, you can have a look at the exomiser documentation here
If needed, you can set the parameter publish_all
to true
, and the output from all pipeline steps will be published.
The names of the subdirectories will match the nextflow process names.
We don't recommend using this in production. This is primarily useful for testing, debugging or troubleshooting.