Skip to content

nextflow pipeline for NICD wastewater surveillance

License

Notifications You must be signed in to change notification settings

andersen-lab/Freyja-nf

 
 

Repository files navigation

Freyja-SRA

nextflow Process samples

Automated SRA downloading, processing and Freyja analysis pipeline for SARS-CoV-2 wastewater sequencing data.

Installation

Local Install via Git

git clone https://github.com/dylanpilz/Freyja-SRA.git
cd Freyja-SRA

Usage

nextflow run main.nf -entry [sra|rerun_demix] -profile [docker|singularity] --accession_list [accession_list.csv] --output_dir [output_dir] --num_samples [num_samples]

Parameters

  • -entry - The pipeline entry point.

    • sra will download, process and run Freyja on the provided SRA accessions.

      • --accession_list - A CSV file containing a list of SRA accessions to download and process. The CSV file should have a header row and the first column should be named accession.
    • rerun_demix will run freyja demix step on previously generated variants output files in the provided variants directory. This is useful if you want to run Freyja on existing data with a different barcode set.

      • --variants_dir must contain files in the format [base_name].variants.tsv [base_name].depths.tsv for each sample.
    • --output_dir - The final output directory. Creates variants, demix, and covariants subdirectories containing respective output files. (default: ./outputs)

    • --num_samples - The number of samples to process. (default: 200)

Configuration

Addtional configuration options can be found in nextflow.config

Data Availability

Freyja-SRA is currently in the process of downloading and processing all publicly available SARS-CoV-2 wastewater data, fetched with the following search terms:

'(Wastewater[All Fields] OR wastewater metagenome[All Fields]) AND ("Severe acute respiratory syndrome coronavirus 2"[Organism] OR SARS-CoV-2[All Fields])

In addition, to the above search terms, we exclude accessions that don't meet the following metadata requirements:

  • Missing collection date
  • Missing catchment size (ww_population)
  • Missing location (geo_loc_name)

To check the status of each accession, please refer to the sample_status column in data/all_metadata.csv. All currently processed freyja outputs are publicly available via Google Cloud Storage at gs://outbreak-ww-data

About

nextflow pipeline for NICD wastewater surveillance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 67.0%
  • Nextflow 26.0%
  • Shell 6.0%
  • Dockerfile 1.0%