bettercallsal
is an automated workflow to assign Salmonella serotype based on NCBI Pathogen Detection Project for Salmonella. It uses MASH
to reduce the search space followed by additional genome filtering with sourmash
. It then performs genome based alignment with kma
followed by count generation using salmon
. This workflow can be used to analyze shotgun metagenomics datasets, quasi-metagenomic datasets (enriched for Salmonella) and target enriched datasets (enriched with molecular baits specific for Salmonella) and is especially useful in a case where a sample is of multi-serovar mixture.
bettercallsal
works on both Illumina short reads and Oxford Nanopore long reads.
It is written in Nextflow and is part of the modular data analysis pipelines (CFSAN PIPELINES or CPIPES for short) at CFSAN.
CPIPES:
This work is published in Frontiers in Microbiology.
bettercallsal: better calling of Salmonella serotypes from enrichment cultures using shotgun metagenomic profiling and its application in an outbreak setting
Kranti Konganti, Elizabeth Reed, Mark Mammel, Tunc Kayikcioglu, Rachel Binet, Karen Jarvis, Christina M. Ferreira, Rebecca Bell, Jie Zheng, Amanda M. Windsor, Andrea Ottesen, Christopher Grim, and Padmini Ramachandran. Frontiers in Microbiology. https://doi.org/10.3389/fmicb.2023.1200983.
- The main workflow has been used for research purposes only.
- Analysis results should be interpreted with caution and should be treated as suspect, as the pipeline is dependent on the precision of metadata from the NCBI Pathogen Detection project for the
per_snp_cluster
andper_computed_serotype
databases. - Internal research with simulated datasets suggests that the
bettercallsal
workflow is more accurate with increased read depth.- For Illumina MiSeq, at least 5 million read pairs (2x300 PE) or 10 million reads (1x300 SE) per sample works well.
- For Illumina NextSeq and NovaSeq, around 10 million read pairs (2x150 PE) or 20 million reads (1x150 SE) per sample works well.
- That being said, it is not a hard-cutoff and you can still try the workflow on low read-depth samples.
- No genome hit assignment should be interpreted with caution.
NCBI Pathogen Detection:
We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the sequence and metadata and sharing it via the NCBI Pathogen Detection site, some of which this research utilizes.
CFSAN, FDA assumes no responsibility whatsoever for use by other parties of the Software, its source code, documentation or compiled or uncompiled executables, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. Further, CFSAN, FDA makes no representations that the use of the Software will not infringe any patent or proprietary rights of third parties. The use of this code in no way implies endorsement by the CFSAN, FDA or confers any advantage in regulatory decisions.