Skip to content

Latest commit

 

History

History
138 lines (96 loc) · 5.62 KB

output.md

File metadata and controls

138 lines (96 loc) · 5.62 KB

ferlab/postprocessing: Output

Introduction

This document describes the output produced by the pipeline. The directories described below will be created in the output directory after the pipeline has finished. All paths are relative to the top-level output directory.

Overview

The pipeline output is saved step-by-step in the output directory as each step is completed. Below, we provide a description of the output folders corresponding to the main steps, as well as the pipeline_info folder, which contains details about the submitted job.

Directory Structure

The output directory structure is as follow:

|_ pipeline_info/
|_ splitmultiallelics/
|_ ensemblvep/
|_ exomiser/results/
...

The pipeline_info subdirectory contains details about the pipeline execution and metadata relevant to reproducibility, performance optimization and troubleshooting.

The splitmultiallelics subdirectory contains the output of the pipeline after completing the normalization step, just before running the vep or exomiser tools.

The ensemblvep subdirectory contains the output after running vep and will appear only if vep is specified in the tools parameters.

The exomiser/results subdirectory contains the output after running exomiser and will appear only if exomiser is specified in the tools parameters.

Pipeline Information: pipeline_info

Here we describe in more details the content of the pipeline_info subdirectory. It should contain the following:

|_ pipeline_info
   |_ configs
      |_ nextflow.config
          ... 
   |_ execution_report_2024-12-09_12-03-20.html
   |_ execution_timeline_2024-12-09_12-03-20.html
   |_ execution_trace_2024-12-09_12-03-20.txt
   |_ params_2024-12-09_12-03-23.json
   |_ pipeline_dag_2024-12-09_12-03-20.html
   |_ metadata.txt
   |_ nextflow.log

The timestamps that appear in some files are in the user's timezone.

The configs folder contains copies of configuration files used. This includes the default nextflow.config file as well as any additional configuration files passed as parameters.

The files prefixed by execution_are reports automatically generated by nextflow. These reports allow you to troubleshoot errors with the pipeline execution and provide inofrmation such as launch commands, run times and resource usage. You can refer to the nextflow documentation for more details about these reports.

The file prefixed by params contains the parameters used by the pipeline.

The file prefixed by pipeline_dag contains a diagram of the pipeline steps.

The metadata.txt file contains various information relevant for reproducibility, such as the original command line, the name of the branch / revision used, the username associated to the command, a list of configuration files passed, the nextflow work directory, etc.

The nextflow.log file is a copy the nextflow log file. Note that it will miss logs written after the workflow.onComplete handler is run.

Normalization Step: splitmultiallelics

The splitmultiallelics subdirectory contains the output of the pipeline after the normalization step, just before running vep and exomiser.

|_ splitmultiallelics/
   |_ family1.splitted.vcf.gz
   |_ family1.splitted.vcf.gz.tbi
   ... 

It contains one pair of vcf.gz, vcf.gz.tbi files per family. Specifically, we use the following naming scheme:

  • <FAMILY_ID>.splitted.vcf.gz
  • <FAMILY_ID>.splitted.vcf.gz.tbi

The family ID should match the family ID in the input sample sheet.

VEP Step: ensemblvep

The ensemblvep subdirectory contains the output of the pipeline after the vep step, if vep was specified in the tools parameter.

|_ ensemblvep/
  |_ variants.family1.vep.vcf.gz
  |_ variants.family1.vep.vcf.gz.tbi
  ...

It contains one pair of vcf.gz, vcf.gz.tbi files per family. Specifically, we use the following naming scheme:

  • variants.<FAMILY_ID>.vep.vcf.gz
  • variants.<FAMILY_ID>.vep.vcf.gz.tbi

The family ID should match the family ID in the input sample sheet.

Exomiser Step: exomiser/results

The exomiser/results subdirectory contains the output fo the pipeline after the exomiser step, if exomiser was specified in the tools parameter.

|_ exomiser/results
   |_ family1.exomiser.genes.tsv
   |_ family1.exomiser.html
   |_ family1.exomiser.json
   |_ family1.exomiser.variants.tsv
   |_ family1.exomiser.vcf.gz
   |_ family1.exomiser.vcf.gz.tbi
  ...   

It should contains a set of 6 files per family. Specifically, we use the following naming scheme:

  • <FAMILY_ID>.exomiser.genes.tsv
  • <FAMILY_ID>.exomiser.html
  • <FAMILY_ID>.exomiser.json
  • <FAMILY_ID>.exomiser.variants.tsv
  • <FAMILY_ID>.exomiser.vcf.gz
  • <FAMILY_ID>.exomiser.vcf.gz.tbi

The family ID should match the family ID in the input sample sheet.

For more details about the content of each of these files, you can have a look at the exomiser documentation here

Others Steps

If needed, you can set the parameter publish_all to true, and the output from all pipeline steps will be published. The names of the subdirectories will match the nextflow process names.

We don't recommend using this in production. This is primarily useful for testing, debugging or troubleshooting.