-
Notifications
You must be signed in to change notification settings - Fork 0
Home
NRPS-motif-Finder is a tool for standardization of Non-ribosomal peptide synthetase (NRPS). It partitions the input NRPS protein sequence by locating these conserved motif, to output a motif-and-intermotif architecture that feeds in subsequent analysis such as C domain classification, NRPS re-engineering.
Note: NRPS-motif-Finder is the only one that can classify all fungal C domain subtypes so far!
Adenylation (A) domain has 12 domain: Aalpha, A1-A5, G-motif, A6-A10. Among them, Aalpha and G-motif were two new motifs proposed in our paper.
Condensation (C) domain has 10 domain: C1-C10.
Thiolation (T) domain has 2 domain: Talpha, T1. Talpha was one new motif our paper.
Thioesterase (TE) domain has 1 domain: TE1.
Epimerization (E) domain has 7 domains: E1-E7.
One of the most important features is that NRPS-motif-Finder supports the full subtype classification of C domain.
Maximum-likelihood phylogenetic tree of the condensation domain superfamily.
Subtype classification and sequences are described in the main text and the Method. Different subtypes are indicated by colors, with subtypes exclusive to fungi marked by underlines, and subtypes found predominantly in bacteria marked by asterisks. This tree is rooted, taking papA and WES as outgroups(black shading). L-clade and D-clade are indicated by blue and red shading, respectively.
C domain subtypes | Species distribution | Function | Comment | Reference sequence source |
---|---|---|---|---|
LCL | Bacteria and Fungi both | LCL-type C domains catalyze peptide bond formation between two L-amino acids. | It's hard to distingush between LCL and SgcC5 due to the high sequence similarity. | Conserved Protein Domain Family |
DCL | Bacteria and Fungi both | The DCL-type C domain catalyzes the condensation between a D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor. | Conserved Protein Domain Family | |
Starter | Bacteria dominate | While standard C domains catalyze peptide bond formation between two amino acids, the (Starter) C-domain may instead acylate an amino acid with a fatty acid in the first module of NRPS. | Conserved Protein Domain Family | |
Dual | Bacteria and Fungi both | Dual function E/C domains have both an epimerization and a DCL condensation activity. Dual E/C domains first epimerize the substrate amino acid to produce a D-configuration, then catalyze the condensation between the D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor. | Conserved Protein Domain Family | |
CT | Fungi only | Unlike bacterial NRPS, which typically have specialized terminal thioesterase (TE) domains to cyclize peptide products, many fungal NRPSs employ a terminal condensation-like (CT) domain to produce macrocyclic peptidyl products. | Conserved Protein Domain Family | |
CT-DCL | Fungi only | CT-DCL domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. | This subtype is proposed in our paper. | Conserved Protein Domain Family |
CT-A | Fungi only | CT-Atypical (CT-A) domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. And it is always behind an ACP (acyl carrier protein) domain rather than a T domain. | This subtype is proposed in our paper. | Conserved Protein Domain Family |
PS | Bacteria dominate | PS domain catalyzes Pictet-Spengler reaction. | Literature | |
bL | Bacteria dominate | Beta-lactam (bL) C domain mediates an unusual cyclization to form beta-lactam rings. | bL domain actually is a subtype of DCL domain. | Conserved Protein Domain Family |
X | Bacteria dominate | X domain is a catalytically inactive Condensation-like domain shown to recruit oxygenases to the NRPS. | Conserved Protein Domain Family | |
Cyc | Bacteria and Fungi both | Cyc (heterocyclization) domains catalyze two separate reactions in the creation of heterocyclized peptide products in NRPS: amide bond formation followed by intramolecular cyclodehydration between a Cys, Ser, or Thr side chain and a carbonyl carbon on the peptide backbone to form a thiazoline, oxazoline, or methyloxazoline ring. | Conserved Protein Domain Family | |
I | Bacteria dominate | Interface (I) domain plays a role in positioning the β-hydroxylase and the NRPS-bound amino acid substrate prior to hydroxylation. | Literature | |
modAA | Bacteria dominate | The core function of modAA C domain is to catalyze the dehydration of beta-hydroxy amino acid (such as Ser, Thr) and form a dehydroamino acid. The derived functions include pyrrolizidine formation, conjugate addition instead of amideformation, pyrimidine formation, l-2-amino-4-methoxy-trans-3-butenoic acid formation, Side chain conjugate addition. | Literature | |
Cglyc | Bacteria dominate | Glycopeptide condensation domain functions in peptide bond formation during glycopeptide antibiotic biosynthesis. | MiBiG v2 | |
Hybrid | Bacteria and Fungi both | C domain of hybrid polyketide synthetase/nonribosomal peptide synthetases (PKS/NRPSs) catalyze peptide bond formation within (usually) large multi-modular enzymatic complexes. Hybrid PKS/NRPS create polymers containing both polyketide and amide linkages. | Conserved Protein Domain Family | |
FUM14 | Fungi only | C domain of NRPS similar to the ester-bond forming Fusarium verticillioides FUM14 protein. | The module with FUM14 domain is always used iteratively. And ester-bond formation function is uncommon. | Conserved Protein Domain Family |
SgcC5 | Bacteria and Fungi both | SgcC5 is a NRPS C domain with ester- and amide- bond forming activity. | It's hard to distingush between LCL and SgcC5 due to the high sequence similarity. | Conserved Protein Domain Family |
LCL-A | Fungi only | C domain with an atypical active site motif. Members of this subfamily typically have a non-canonical conserved SHXXXDX(14)Y motif which replaces HHXXXD motif typically found in the C domain. | This subtype is named in our paper. | Conserved Protein Domain Family |
E | Bacteria and Fungi both | Epimerization (E) domains of NRPS flip the chirality of the end amino acid of a peptide being manufactured by the NRPS. | Conserved Protein Domain Family |
Note: In the NRPS-motif-Finder result, E domain is not considered to be a kind of C domain subtype. And E domain has 7 motifs while C domain has 10 motifs.
HMM files for C domain subtype classification can be found in here.
The raw sequences for HMM construction can be found in here.
There are two version of NRPS-motif-Finder applied in Matlab and Python.
We recommend Matlab version because it will be update frequently for solving bug. And Python version is stable version used in our online platform.
Matlab code of NRPS motif Finder.
Python code of NRPS motif Finder.