Snakemake pipeline for running IRMA. Designed for Influenza and RSV Illumina Sequencing.
Note: This pipeline is not ready for general use, while it generally works, it is very brittle.
- Linux Distro (or *unix system like MacOS)
- Conda
- Snakemake
- Cutadapt
- IRMA
- R
- R Packages:
ggplot2
dplyr
stringr
tidyr
cowplot
gridExtra
furrr
- Install
miniconda
: https://docs.conda.io/en/latest/miniconda.html - Install the latest
mamba
:conda install -n base -c conda-forge mamba
- Git clone this repo:
git clone --depth 1 https://github.com/ammaraziz/wfi
- Install dependencies:
cd wfi mamba env create -f conda.yaml
- Activate
wfi
environment:conda activate wfi
- Install
miniconda
: https://docs.conda.io/en/latest/miniconda.html - Install
snakemake
: https://snakemake.readthedocs.io/en/stable/getting_started/installation.htmlconda install -n base -c conda-forge mamba mamba install -c bioconda -c conda-forge snakemake-minimal python=3.9
- Install
cutadapt
andbiopython
:mamba install -c bioconda -c conda-forge cutadapt biopython bbmap
- Install R (>3.6 should work) and R packages:
mamba install -c conda-forge r-base mamba install -c r r-ggplot2 r-dplyr r-tidyr r-cowplot r-gridExtra r-optparse r-furrr
Note: installing r packages through conda is troublesome for some, if so install manually in R.
- Install custom verison of IRMA which contains the RSV module:
mamba install -c ammaraziz irma
- Finally, download this repository and store in your /bin/
To use the pipeline, follow these steps:
- Navigate to
config.yaml
and modify as appropriate:
Params | Values | Information |
---|---|---|
input_dir | path | input directory - location of the raw fastq files for input |
output_dir | path | output directory - location to output results - same dir where the config sits |
second_assembly | True /False |
if you suspect mixtures, set to True . It will increase run time substantially |
subset | True /False |
if you are only sequencing HA/NA/MP set this to True else leave as False |
trim_prog | standard /tile |
Trimming program to use, tile (bbduk) or standard (cutadapt) |
trim_org | h1 /h3 |
Influenza only, Flu subtype |
technology | illumina /ont /pgm |
seq technology used, will change the module by IRMA |
- Check snakemake is installed, if an error is produced it means snakemake was not found or it is not installed.
% snakemake --version
% 5.10.0
- Test the pipeline, this will output all the commands that will be run. Look for errors (red).
% snakemake -nq
- Run the pipeline, with option
-j
to specify number of cores to use.
% snakemake -j 8
- Pipeline will output correctly formatted names located in:
{output_dir}
/assemblies/rename/
- Sorted by subtype - most likely the disired output:
{output_dir}
/assemblies/rename/type/FLU{A|B}
- IRMA assembly specific files, see: https://wonder.cdc.gov/amd/flu/irma/output.html
{output_dir}
/assemblies/{sampleID}
/
- Files for depth and summary info located in:
{output_dir}
/assemblies/{sampleID}
/figures/{output_dir}
/assemblies/{sampleID}
/tables/
- BLAT for the match step
- LABEL, which also packages certain resources used by IRMA:
- Sequence Alignment and Modeling System (SAM) for both the rough align and sort steps
- Shogun Toolbox, which is an essential part of LABEL, is used in the sort step
- SSW for the final assembly step, download our minor modifications to SSW
- samtools for BAM-SAM conversion as well as BAM sorting and indexing
- GNU Parallel for single node parallelization
- R and these R packages: optparse, ggplot2, dplyr, tidyr, stringr, cowplot, gridExtra
1. Error regarding path directories Check input and output directorys you've specified end with a '/'
2. Error: Nothing to be done Check config file and ensure you've changed the input/output directories.
3. A job crashed. What do I do? Two options, delete the output directory so snakemake can run everything again.
Or find out where it crashed and delete the whole folder/sample.
Example, sometimes IRMA produces errors, find the sample which crashed,
go to assemblies and delete the corresponding {sampleID} folder. Rerun snakemake.
4. I'm very confusd
or I need more help
or I've screwed something up badly! Shoot me an email
For any issues please submit a github issue.