Skip to content

CptChiler/Kleborate

 
 

Repository files navigation

Kleborate

Kleborate is a tool to screen Klebsiella genome assemblies for:

  • MLST sequence type
  • species (e.g. K. pneumoniae, K. quasipneumoniae, K. variicola, etc.)
  • ICEKp associated virulence loci: yersiniabactin (ybt), colibactin (clb)
  • virulence plasmid associated loci: salmochelin (iro), aerobactin (iuc), hypermucoidy (rmpA, rmpA2)
  • antimicrobial resistance genes, including quinolone resistance SNPs and colistin resistance truncations
  • K (capsule) and O antigen (LPS) serotype prediction, via wzi alleles and Kaptive

A manuscript describing the Kleborate software in full is currently in preparation.

In the meantime, if you use Kleborate, please cite the component schemes that you report:

Yersiniabactin and colibactin (ICEKp) Lam, MMC. et al. Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations. Microbial Genomics (2018).

Aerobactin and salmochelin: Lam, MMC. et al. Tracking key virulence loci encoding aerobactin and salmochelin siderophore synthesis in Klebsiella pneumoniae. bioRxiv (2018).

Kaptive for capsule (K) serotyping: Wyres, KL. et al. Identification of Klebsiella capsule synthesis loci from whole genome data. Microbial Genomics (2016).

Kaptive for O antigen (LPS) serotyping: Wick, RR et. al. Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for Klebsiella genomes. Journal of Clinical Microbiology (2018).

Table of Contents

Background

Klebsiella pneumoniae (Kp) is a commensal bacterium that causes opportunistic infections, with a handful of hypervirulent lineages recognised as true human pathogens. Evidence is now mounting that other Kp strains carrying acquired siderophores (yersiniabactin, salmochelin and aerobactin) and/or the genotoxin colibactin are also highly pathogenic and can cause invasive disease.

Our goal is to help identify emerging pathogenic Kp lineages, and to make it easy for people who are using genomic surveillance to monitor for antibiotic resistance to also look out for the convergence of antibiotic resistance and virulence. To help facilitate that, in this repo we share code for genotyping virulence and resistance genes in K. pneumoniae. A table of pre-computed results for 2500 public Klebs genomes is also provided in the data directory.

Requirements

Software requirements:

  • Python (either 2.7 or 3)
  • setuptools (required to install Kleborate)
    • To install: pip install setuptools
  • BLAST+ command line tools (makeblastdb, blastn, etc.)
    • Version 2.2.30 or later is needed, as earlier versions have a bug with the culling_limit parameter.
  • Mash is required to use the --species option

As input, Kleborate takes Klebsiella genome assemblies (either completed or draft). If you have unassembled reads, try assembling them with our Unicycler assembler which works great on Illumina or hybrid Illumina + Nanopore/PacBio reads).

Installation

Kleborate can be installed to your system for easy usage:

git clone --recursive https://github.com/katholt/Kleborate.git
cd Kleborate
python setup.py install
kleborate -h

Alternatively, you can clone and run Kleborate without installation directly from its source directory:

git clone --recursive https://github.com/katholt/Kleborate.git
Kleborate/kleborate-runner.py -h

See examples below to test out your installation on some public genome data.

Basic usage

Screen some genomes for MLST and virulence loci:
kleborate -o results.txt -a *.fasta

Also screen for resistance genes:
kleborate --resistance -o results.txt -a *.fasta

Turn on all of Kleborate's optional screens (resistance genes, species check and both K and O loci):
kleborate --all -o results.txt -a *.fasta

Screen everything in a set of gzipped assemblies:
kleborate --all -o results.txt -a *.fasta.gz

Full usage

usage: kleborate -a ASSEMBLIES [ASSEMBLIES ...] [-r] [-s] [--kaptive_k]
                 [--kaptive_o] [-k] [--all] [-o OUTFILE]
                 [--kaptive_k_outfile KAPTIVE_K_OUTFILE]
                 [--kaptive_o_outfile KAPTIVE_O_OUTFILE] [-h] [--version]

Kleborate: a tool for characterising virulence and resistance in Klebsiella

Required arguments:
  -a ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]
                        FASTA file(s) for assemblies, can be gzipped (.gz)

Screening options:
  -r, --resistance      Turn on resistance genes screening (default: no
                        resistance gene screening)
  -s, --species         Turn on Klebsiella species identification (requires
                        Mash, default: no species identification)
  --kaptive_k           Turn on Kaptive screening of K loci (default: do not
                        run Kaptive for K loci)
  --kaptive_o           Turn on Kaptive screening of O loci (default: do not
                        run Kaptive for O loci)
  -k, --kaptive         Equivalent to --kaptive_k --kaptive_o
  --all                 Equivalent to --resistance --species --kaptive

Output options:
  -o OUTFILE, --outfile OUTFILE
                        File for detailed output (default:
                        Kleborate_results.txt)
  --kaptive_k_outfile KAPTIVE_K_OUTFILE
                        File for full Kaptive K locus output (default: do not
                        save Kaptive K locus results to separate file)
  --kaptive_o_outfile KAPTIVE_O_OUTFILE
                        File for full Kaptive O locus output (default: do not
                        save Kaptive O locus results to separate file)

Help:
  -h, --help            Show this help message and exit
  --version             Show program's version number and exit

Screening details

MLST

Multilocus sequencing typing of Klebsiella follows the schemes described at the Klebsiella pneumoniae BIGSdb hosted at the Pasteur Institute. The alleles and schemes are stored in the data directory of this repository.

Some notes on Kleborate's MLST calls:

  • Kleborate makes an effort to report the closest matching ST / clonal group if a precise match is not found.
  • Imprecise allele matches are indicated with a *.
  • Imprecise ST calls are indicated with -nLV, where n indicates the number of loci that disagree with the ST reported. So 258-1LV indicates a single-locus variant of (SLV) of ST258, i.e. 6/7 loci match ST258.

Virulence loci

Kleborate examines four key virulence loci in Klebsiella: the siderophores yersiniabactin (ybt), aerobactin (iuc) and salmochelin (iro), and the genotoxin colibactin (clb).

  • For each of these loci, Kleborate will call a sequence type using the same logic as the MLST described above.
  • If the locus is not detected, Kleborate reports the ST as 0 and the lineage as -.
  • Kleborate will also report the lineage associated with the virulence sequence types, as outlined below and detailed in the corresponding papers (for yersiniabactin, we also report the predicted ICEKp structure based on the ybt lineage assignment).

Yersiniabactin and colibactin (primarily mobilised by ICEKp)

We recently explored the diversity of the Kp integrative conjugative element (ICEKp), which mobilises the yersiniabactin locus ybt, using genomic analysis of a diverse set of 2498 Klebsiella (see this paper). Overall, we found ybt in about a third of all Kp genomes and clb in about 14%. We identified 17 distinct lineages of ybt (see figure) embedded within 14 structural variants of ICEKp that can integrate at any of four tRNA-Asn sites in the chromosome. Three of the ybt 17 lineages were associated with three lineages of colibactin, with which they are co-located in the same ICE structure designated ICEKp10. One ICE structure (ICEKp1) carries the salmochelin synthesis locus iro and rmpA hypermucoidy gene in addition to ybt (lineage 2). Additionally, we identify a lineage of ybt that is plasmid-encoded, representing a new mechanism for ybt dispersal in Kp populations. Based on this analysis, we developed a MLST-style approach for assigning yersiniabactin sequence types (YbST) and colibactin sequence types (CbST), which is implemented in Kleborate. Annotated reference sequences for each ICEKp variant are included in the data directory of this repository).

ybt tree

Aerobactin and salmochelin (primarily mobilised by virulence plasmids)

We further explored the genetic diversity of the aerobactin (iuc) and salmochelin (iro) loci among a dataset of 2733 Klebsiella genomes (see this preprint). We identified five iro and six iuc lineages (see figure), each of which was associated with a specific location within Kp genomes. The most common lineages were iuc1 and iro1, which are found together on the virulence plasmid KpVP-1 (typified by pK2044 or pLVPK common to the hypervirulent clones ST23, ST86, etc). iuc2 and iro2 lineages were associated with the alternative virulence plasmid KpVP-2 (typified by Kp52.145 plasmid II from the K2 ST66 lab strain known as Kp52.145 or B5055). iuc5 and iro5 originate from E. coli and are carried (often together) on E. coli plasmids that can transfer to Kp. The lineages iuc2A, iuc3 and iro4 were associated with other novel plasmids that have not yet been previously described in Kp. In addition, we found the salmochelin locus present in ICEKp1 constitutes its own lineage iro3, and the aerobactin locus present in the chromosome of ST67 Kp subsp rhinoscleromatis strains constitutes its own lineage iuc4. Based on this analysis, we developed a MLST-style approach for assigning aerobactin sequence types (AbST) and salmochelin sequence types (SmST) which is implemented in Kleborate.

iuc and iro trees

Please note that the aerobactin iuc and salmochelin iro lineage names have been updated between Kleborate version 0.2.0 and 0.3.0 to match the nomenclature used in the preprint. The AbST and SmST allele numbers are unchanged. Lineage name re-assignments are:

v0.2.0 v0.3.0 location (see preprint for details)
iuc 2 iuc 1 KpVP-1 (e.g. pLVPK)
iuc 3B iuc 2 KpVP-2
iuc 3A iuc 2A other plasmids
iuc 4 iuc 3 other plasmids
iuc 5 iuc 4 rhinoscleromatis chromosome
iuc 1 iuc 5 E. coli variant
iro 3 iro 1 KpVP-1 (e.g. pLVPK)
iro 4 iro 2 KpVP-2
iro 5 iro 3 ICEKp1
iro 2 iro 4 Enterobacter variant
iro 1 iro 5 E. coli variant

Hypermucoidy genes

Kleborate screens for alleles of the rmpA and rmpA2 genes which result in a hypermucoid phenotype by upregulating capsule production.

  • The two genes share ~83% nucleotide identity so are easily distinguished, and are reported in separate columns.
  • Alleles for each gene are sourced from the BIGSdb. For rmpA, we have also mapped thes alleles to the various known locations for rmpA in Klebsiella (i.e. major virulence plasmids KpVP-1 and KpVP-2; other virulences plasmids simply designated as VP; ICEKp1 and the chromosome in rhinoscleromatis).
  • Unique (non-overlapping) nucleotide BLAST hits with >95% identity and >50% coverage are reported. Note multiple hits to the same gene are reported if found (e.g. the NTUH-K2044 genome carries rmpA in the virulence plasmid and also in ICEKp1, which is reported in the rmpA column as rmpA_11(ICEKp1),rmpA_2(KpVP-1)).

Resistance gene detection

By using the --resistance option, Kleborate will screen for resistance genes against the ARG-Annot database of acquired resistance genes (SRST2 version), which includes allelic variants. It attempts to report the best matching variant for each locus in the genome:

  • Imprecise allele matches are indicated with *.
  • If the length of match is less than the length of the reported allele (i.e. a partial match), this is indicated with ?.
  • Note that narrow spectrum beta-lactamases AmpH and SHV () are core genes in K. pneumoniae and so should be detected in most genomes.
    • These genes include: SHV (K. pneumoniae), LEN (K. variicola), OKP (K. quasipneumoniae) and AmpH (all of the above species)
    • See this paper for more information.
  • Note that oqxAB are also core genes in K. pneumoniae, but have been removed from this version of the ARG-Annot DB as they don't actually confer resistance to fluoroquinolones

Using the --resistance option also turns on screening for resistance-conferring mutations:

  • Fluoroquinolone resistance SNPs: GyrA 83 & 87 and ParC 80 & 84.
  • Colistin resistance due to truncation or loss of MgrB or PmrB (less than 90% gene coverage counts as a truncation/loss).

All resistance results (both for the gene screen and mutation screen) are grouped by drug class (according to the ARG-Annot DB), with beta-lactamases broken down into Lahey classes, as follows:

  • AGly (aminoglycosides)
  • Bla (beta-lactamases)
  • Bla_broad (broad spectrum beta-lactamases)
  • Bla_broad_inhR (broad spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
  • Bla_Carb (carbapenemase)
  • Bla_ESBL (extended spectrum beta-lactamases)
  • Bla_ESBL_inhR (extended spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
  • Fcyn (fosfomycin)
  • Flq (fluoroquinolones)
  • Gly (glycopeptides)
  • MLS (macrolides)
  • Phe (phenicols)
  • Rif (rifampin)
  • Sul (sulfonamides)
  • Tet (tetracyclines)
  • Tmt (trimethoprim)

Scores and counts

Kleborate outputs a simple categorical virulence score, and if resistance screening is enabled, an antimicrobial resistance score as well. These scores provide a rough categorisation of the strains to facilitate monitoring resistance-virulence convergence:

  • The virulence score ranges from 0 to 5:
    • 0 = no virulence loci
    • 1 = yersiniabactin only
    • 2 = yersiniabactin and colibactin, or colibactin only
    • 3 = aerobactin and/or salmochelin only (without yersiniabactin or colibactin)
    • 4 = aerobactin and/or salmochelin with yersiniabactin (without colibactin)
    • 5 = yersiniabactin, colibactin and aerobactin and/or salmochelin
  • The resistance score ranges from 0 to 3:
    • 0 = no ESBL, no carbapenemase (regardless of colistin resistance)
    • 1 = ESBL, no carbapenemase (regardless of colistin resistance)
    • 2 = Carbapenemase without colistin resistance (regardless of ESBL)
    • 3 = Carbapenemase with colistin resistance (regardless of ESBL)

When resistance screening is enabled, Kleborate also quantifies how many resistance genes are present and how many resistance classes have at least one gene. Since a resistance class can have multiple genes (as is often the case for the intrinsic genes in the Bla class), the gene count is typically higher than the class count.

Klebsiella species

By using the --species option, Kleborate will attempt to identify the species of Klebsiella. It does this by comparing the assembly using Mash to a curated set of Klebsiella assemblies from NCBI and reporting the species of the closest match. Kleborate considers a Mash distance of ≤ 0.01 to be a strong species match. A distance of > 0.01 and ≤ 0.03 is a weak match and might indicate that your sample is a novel lineage or a hybrid between multiple Klebsiella species.

Here is an annotated tree of the reference assemblies, made by mashtree:

Klebsiella species tree

Kleborate is designed for the well-studied group of species at the top right of the tree which includes the 'big three': pneumoniae, quasipneumoniae (two subspecies) and variicola. K. quasivariicola is more recently characterised and described here: Long 2017. The Kp5 group does not yet have a species name and was described in this paper: Blin 2017. More distant Klebsiella species (oxytoca, michiganensis, grimontii and aerogenes) are also included, but the virulence profiles of these are less well characterised and deserve further attention.

Kleborate will also call other species in Enterobacteriaceae, as different species sometimes end up in Klebsiella collections. These names are again assigned based on the clades in a mashtree, but were not as carefully curated as the Klebsiella species (so take them with a grain of salt).

Serotype prediction

Basic capsule prediction with wzi allele typing

By default, Kleborate will report the closest match amongst the wzi alleles in the BIGSdb. This is a marker of capsule locus (KL) type, which is highly predictive of capsule (K) serotype. Although there is not a 1-1 relationship between wzi allele and KL/K type, there is a strong correlation (see Wyres et al, MGen 2016). The wzi allele can provide a handy way of spotting the virulence-associated types (wzi=K1, wzi2=K2, wzi5=K5); or spotting capsule switching within clones, e.g. you can tell which ST258 lineage you have from the wzi type (wzi154: the main lineage II; wzi29: recombinant lineage I; others: probably other recombinant lineages).

Capsule (K) and O antigen (LPS) serotype prediction using Kaptive

You can optionally turn on capsule typing using the dedicated capsule typing tool Kaptive:

  • --kaptive_k turns on Kaptive screening of the K locus
  • --kaptive_o turns on Kaptive screening of the O locus
  • --kaptive turns on both (is equivalent to --kaptive_k --kaptive_o)

This will significantly increase the runtime of Kleborate, but provide much more detailed information about the K and/or O loci and their genes.

Example output

Test data

Run these commands to download some well-known Klebsiella genomes and run Kleborate with all optional screens enabled:

wget -O NTUH-K2044.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/885/GCA_000009885.1_ASM988v1/GCA_000009885.1_ASM988v1_genomic.fna.gz
wget -O SGH10.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/813/595/GCA_002813595.1_ASM281359v1/GCA_002813595.1_ASM281359v1_genomic.fna.gz
wget -O Klebs_HS11286.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/240/185/GCA_000240185.2_ASM24018v2/GCA_000240185.2_ASM24018v2_genomic.fna.gz
wget -O MGH78578.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/016/305/GCA_000016305.1_ASM1630v1/GCA_000016305.1_ASM1630v1_genomic.fna.gz

kleborate  --all -o results.txt -a *.fasta.gz

Concise results (stdout)

These are the concise Kleborate results that it prints to the terminal:

strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus K_locus_confidence O_locus O_locus_confidence AGly Col Fcyn Flq Gly MLS Ntmdz Phe Rif Sul Tet Tmt Bla Bla_Carb Bla_ESBL Bla_ESBL_inhR Bla_broad Bla_broad_inhR
Klebs_HS11286 Klebsiella pneumoniae ST11 1 2 ybt 9; ICEKp3 15 - 0 - 0 - 0 - - wzi74 KL103 Very high O2v1 Very high StrB;StrA*;AadA2*;RmtB;Aac3-IId*? - - GyrA-83I;ParC-80I - - - - - SulII TetG DfrA12? AmpH* KPC-2 CTX-M-14;CTX-M-14 - SHV-11 TEM-30*;TEM-30*;TEM-30*
MGH78578 Klebsiella pneumoniae ST38 0 1 - 0 - 0 - 0 - 0 - - wzi50 KL52 Perfect OL101 High AadA1-pm*?;Aac6-Ib;StrB;Aph3''Ia;StrA;AadB - - GyrA-83Y - - - CmlA5;CatA1* - SulI;SulII TetD - AmpH*;SHV-187*;OXA-9* - SHV-12 - - TEM-54*;TEM-30*
NTUH-K2044 Klebsiella pneumoniae ST23 4 0 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11 (ICEKp1),rmpA_2 (KpVP-1) rmpA2_3 wzi1 KL1 Perfect O1v2 Very high - - - - - - - - - - - - AmpH;SHV-190* - - - - -
SGH10 Klebsiella pneumoniae ST23 5 0 ybt 1; ICEKp10 53 clb 2 29 iuc 1 1 iro 1 2 rmpA_2 (KpVP-1) rmpA2_6* wzi1 KL1 Very high O1v2 Very high - - - - - - - - - - - - AmpH;SHV-190* - - - - -

Full results (file)

Here are the full Kleborate results, written to results.txt:

strain species species_match contig_count N50 largest_contig ST virulence_score resistance_score num_resistance_classes num_resistance_genes Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus K_locus_problems K_locus_confidence K_locus_identity K_locus_missing_genes O_locus O_locus_problems O_locus_confidence O_locus_identity O_locus_missing_genes Chr_ST gapA infB mdh pgi phoE rpoB tonB ybtS ybtX ybtQ ybtP ybtA irp2 irp1 ybtU ybtT ybtE fyuA clbA clbB clbC clbD clbE clbF clbG clbH clbI clbL clbM clbN clbO clbP clbQ AGly Col Fcyn Flq Gly MLS Ntmdz Phe Rif Sul Tet Tmt Bla Bla_Carb Bla_ESBL Bla_ESBL_inhR Bla_broad Bla_broad_inhR
Klebs_HS11286 Klebsiella pneumoniae strong 7 5333942 5333942 ST11 1 2 9 17 ybt 9; ICEKp3 15 - 0 - 0 - 0 - - wzi74 KL103 * Very high 96.69% O2v1 none Very high 97.72% ST11 3 3 1 1 1 1 4 14 11 14 5 9 22 19 10 5 11 11 - - - - - - - - - - - - - - - StrB;StrA*;AadA2*;RmtB;Aac3-IId*? - - GyrA-83I;ParC-80I - - - - - SulII TetG DfrA12? AmpH* KPC-2 CTX-M-14;CTX-M-14 - SHV-11 TEM-30*;TEM-30*;TEM-30*
MGH78578 Klebsiella pneumoniae strong 6 5315120 5315120 ST38 0 1 7 15 - 0 - 0 - 0 - 0 - - wzi50 KL52 none Perfect 100.00% OL101 * High 94.91% ST38 2 1 2 1 2 2 2 - - - - - - - - - - - - - - - - - - - - - - - - - - AadA1-pm*?;Aac6-Ib;StrB;Aph3''Ia;StrA;AadB - - GyrA-83Y - - - CmlA5;CatA1* - SulI;SulII TetD - AmpH*;SHV-187*;OXA-9* - SHV-12 - - TEM-54*;TEM-30*
NTUH-K2044 Klebsiella pneumoniae strong 2 5248520 5248520 ST23 4 0 0 0 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11 (ICEKp1),rmpA_2 (KpVP-1) rmpA2_3 wzi1 KL1 none Perfect 100.00% O1v2 none Very high 99.13% ST23 2 1 1 1 9 4 12 9 7 9 6 5 1 1 6 7 7 6 - - - - - - - - - - - - - - - - - - - - - - - - - - - AmpH;SHV-190* - - - - -
SGH10 Klebsiella pneumoniae strong 2 5485114 5485114 ST23 5 0 0 0 ybt 1; ICEKp10 53 clb 2 29 iuc 1 1 iro 1 2 rmpA_2 (KpVP-1) rmpA2_6* wzi1 KL1 none Very high 100.00% O1v2 none Very high 99.11% ST23 2 1 1 1 9 4 12 2 2 2 2 2 6 124 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 - - - - - - - - - - - - AmpH;SHV-190* - - - - -

Typing from Illumina reads

MLST assignment can also be achieved direct from reads using SRST2:

  • Download the YbST, CbST, AbST, SmST allele sequences and profile tables from the data directory in this repository.
  • Install SRST2 if you don't already have it (git clone https://github.com/katholt/srst2).
  • Run SRST2, setting the --mlst_scheme and --mlst_definitions to point to the YbST or CbST allele sequences and profile tables.

Note that currently you can only run SRST2 with one MLST scheme at a time, so in order to type MLST, YbST and CbST you will need to run three separate commands:

srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output YbST --log --mlst_db ybt_alleles.fasta --mlst_definitions YbST_profiles.txt
srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output CbST --log --mlst_db clb_alleles.fasta --mlst_definitions CbST_profiles.txt
srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output Klebs --log --mlst_db Klebsiella_pneumoniae.fasta --mlst_definitions kpnuemoniae.txt

Contact us

Kleborate is under active development with many other Klebs genomic analysis tools and projects in progress.

Please get in touch via the GitHub issues tracker if you have any issues, questions or ideas.

For more on our lab, including other software, see http://holtlab.net

License

GNU General Public License, version 3



Stop! Kleborate and listen
ICEKp is back with my brand-new invention
If there was a problem, Klebs'll solve it
Check out the hook while Klebs evolves it

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%