RECSALMO (Rapid Typing and Characterization Tool for Whole Genome Sequencing Data of Salmonella)

Introduction:

Prevalent foodborne pathogens that is Salmonella bacteria can cause serious illness which is mainly gastroenteritis with symptoms ranging from diarrhea, vomiting, nausea, muscle ache, fever and stomach cramps. In addition, the bacteria can cause a wide-scale outbreak; and their sources can be either plants, animals or humans such as onions, tomatoes, pigs and chickens. As such, performing epidemiological surveillance and outbreak control is essential in containing those Salmonella outbreaks. This project is intended to provide a python package for rapidly analyzing genome assemblies of Salmonella bacteria. This fast analysis of Salmonella genomes includes sequence-based typing (serotyping, MLST, cgMLST), antimicrobial Resistance genes, antibiotic classes/subclasses, phylogenetic tree construction, determination of Salmonella Pathogenicity Islands (SPI), and CRISPR determination. This tool is intended to work only in Linux-like operating system e.g. Ubuntu.

Instructions:

All genome assembly files must be contained in a single folder; and that folder must not have any sub-folders.
The assembly files must be of the FASTA format.
For each name of a genome assembly file, the program assigns a string in front of the first period symbol to be the main output name. For instances, if the name of an assembly file is Assembly1.scaffolds.fasta, the main output name (and also the output sub-folder name) becomes "Assembly1". Thus, names should be carefully adjusted to avoid duplicates that lead to errors.
The user must assign a single folder (in the command line) as the output folder. All the analysis results will be kept in this folder.

Database:

The followings are the embedded databases necessary for the analysis run of Salmonella genomes.

Reference genome: Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
Reference genome: Salmonella enterica subsp. enterica serovar Typhimurium str. CT18
Reference genome: Salmonella enterica subsp. enterica serovar Gallinarum str. 287/91
SPIs (Salmonella Pathogenicity Islands): SPI-1 to SPI-17
CRISPR spacers with assigned nametags

Outputs:

The outputs comprise both raw analytical reports (from dependent/custom packages and external software) and one summary file in “xlsx” and "csv" format. All the columns in the summary file include

Genome Assembly Name
Salmonella sub-species
Salmonella serovar
MLST (Multi-Locus Sequence Typing)
cgMLST (core-genome MLST)
AMR:Gene/Antibiotic
SPI (Salmonella Pathogenicity Islands)
Spacer-C1 (Spacer list in CRISPR locus 1)
Spacer-C2 (Spacer list in CRISPR locus 2)
NumSP-C1 (Number of spacers in CRISPR locus 1)
NumSP-C2 (Number of spacers in CRISPR locus 2)
DR-C1 (Direct repeat of CRISPR locus 1)
DR-C2 (Direct repeat of CRISPR locus 2)
Pos-C1 (Start-Stop Positions of CRISPR locus 1)
Pos-C2 (Start-Stop Positions of CRISPR locus 2)
LenC1 (Length in base-pair of CRISPR locus 1)
LenC2 (Length in base-pair of CRISPR locus 2)

Other than the summary file, there are also phylogenetic trees and pie-charts of all the genomes as follows

SNP-based Phylogenetic tree: this dendrogram is created by ParSNP tool with the Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 as the reference genome (NCBI Genome ID = SO4698-09 and accession = LN999997.1). The phylogenetic tree file is in the *.ggr format which can be opened and visualized by the Gingr software.
CRISPR-based Phylogenetic tree: this dendrogram is created by the alignment of CRISPR spacers of both locus 1 and 2.
Pie chart of serovar
Pie chart of ST

For the raw analytical report files, each analysis of one genome is composed of essential files as follows.

“mlst_result.txt”
“amrfinder_result.txt”
“SPI_finderResult.txt”
“crispr_result.txt”
"spProfile.txt"

Installation:

This project is written mainly in Python; there are several dependent packages and software. The recommended version of Python is 3.8 and above. To make the installation a smooth experience, follow the steps below:

Download and install miniconda (Linux version): https://docs.anaconda.com/free/miniconda/index.html
Install Java

sudo apt update

sudo apt install default-jdk

sudo apt install default-jre
Create a custom conda environment (replacing "myenv" with the name of your choice)

conda create --name myenv

conda activate myenv
Install dependent conda packages

conda install -c bioconda fastmlst,sistr_cmd,ncbi-amrfinderplus,parsnp

conda install -c conda-forge openpyxl,seaborn
Update databases for fastmlst and ncbi-amrfinderplus packages

fastmlst --update-mlst -t 1

amrfinder -u
Create a folder (any name is fine) and put all the files of the RECSALMO project inside that folder (recsalmo.py is inside the project, first level).

Usage:

Users need to supply the input folder containing the genome assembly files. And the main output folder should be supplied as well but it is optional. If the output folder is not given, the folder “out_file” will be created in the current working directory and used as the main output folder. Users need to supply

Absolute path to recsalmo.py
Absolute path to input folder
Absolute path to output folder (optional) The format of program call is below

python /path/recsalmo.py –input /path/input_folder –output /path/output_folder

Example program call supposing that all paths are under /home

python /home/recsalmo/recsalmo.py –input /home/input_folder –output /home/output_folder

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
ext		ext
lib		lib
maindb		maindb
output_example		output_example
LICENSE		LICENSE
README.md		README.md
binfutil.py		binfutil.py
const.py		const.py
main.py		main.py
recsalmo.py		recsalmo.py
structobj.py		structobj.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RECSALMO (Rapid Typing and Characterization Tool for Whole Genome Sequencing Data of Salmonella)

About

Releases

Packages

Languages

License

aongithub172/recsalmo

Folders and files

Latest commit

History

Repository files navigation

RECSALMO (Rapid Typing and Characterization Tool for Whole Genome Sequencing Data of Salmonella)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages