Skip to content
martinghunt edited this page Dec 13, 2021 · 24 revisions

Installation

Clone this repository and then from its root, build either a Singularity or Docker container.

To build a singularity container:

singularity build viridian_workflow.img Singularity.def

To build a docker container:

docker build --network=host .

(without --network=host you will likely get pip install timing out and the build failing).

Both the Docker and Singularity container will have the main script viridian_workflow installed.

Basic Usage

The examples below will run the default pipeline, using the built-in SARS-CoV-2 amplicon schemes ARTIC V3, ARTIC V4, and Midnight-1200. The pipeline automatically detects the scheme that best matches the input reads. To use your own amplicon scheme and/or force the choice of scheme, please read the amplicon schemes page. For a more detailed description of the pipeline options, please read the workflow usage page.

To run on paired Illumina reads:

viridian_workflow run_one_sample \
  --tech illumina
  --ref_fasta data/MN908947.fasta \
  --reads1 reads_1.fastq.gz \
  --reads2 reads_2.fastq.gz \
  --outdir OUT

To run on unpaired nanopore reads:

viridian_workflow run_one_sample \
  --tech ont
  --ref_fasta data/MN908947.fasta \
  --reads reads.fastq.gz \
  --outdir OUT

The FASTA file in those commands can be found in the viridian_workflow/amplicon_scheme_data/ directory of this repository.

Other options:

  • --sample_name MY_NAME: use this to change the sample name (default is "sample") that is put in the final FASTA file, BAM file, and VCF file.
  • --keep_bam: use this option to keep the BAM file of original input reads mapped to the reference genome.
  • --force: use with caution - it will overwrite the output directory if it already exists.

Output files

The default files in the output directory are:

  • consensus.fa: a FASTA file of the consensus sequence.
  • variants.vcf: a VCF file of the identified variants between the consensus sequence and the reference genome.
  • log.json: contains logging information for the viridian workflow run. This is described in detail in the JSON output file page.

If the option --keep_bam is used, then a sorted BAM file of the reads mapped to the reference will also be present, called reference_mapped.bam (and its index file reference_mapped.bam.bai).

Clone this wiki locally