Skip to content
Ruolin He edited this page Mar 10, 2024 · 5 revisions

What is NRPS-motif-Finder

NRPS-motif-Finder is a tool for standardization of Non-ribosomal peptide synthetase (NRPS). It partitions the input NRPS protein sequence by locating these conserved motif, to output a motif-and-intermotif architecture that feeds in subsequent analysis such as C domain classification, NRPS re-engineering.

logo

Note: NRPS-motif-Finder is the only one that can classify all fungal C domain subtypes so far!


Supported domains and motifs

Adenylation (A) domain has 12 domain: Aalpha, A1-A5, G-motif, A6-A10. Among them, Aalpha and G-motif were two new motifs proposed in our paper.

Condensation (C) domain has 10 domain: C1-C10.

Thiolation (T) domain has 2 domain: Talpha, T1. Talpha was one new motif our paper.

Thioesterase (TE) domain has 1 domain: TE1.

Epimerization (E) domain has 7 domains: E1-E7.

C domain subtype

One of the most important features is that NRPS-motif-Finder supports the full subtype classification of C domain.

C_all_tree7

Maximum-likelihood phylogenetic tree of the condensation domain superfamily.

Subtype classification and sequences are described in the main text and the Method. Different subtypes are indicated by colors, with subtypes exclusive to fungi marked by underlines, and subtypes found predominantly in bacteria marked by asterisks. This tree is rooted, taking papA and WES as outgroups(black shading). L-clade and D-clade are indicated by blue and red shading, respectively.

The details of C domain subtypes

C domain subtypes Species distribution Function Comment Reference sequence source
LCL Bacteria and Fungi both LCL-type C domains catalyze peptide bond formation between two L-amino acids. It's hard to distingush between LCL and SgcC5 due to the high sequence similarity. Conserved Protein Domain Family
DCL Bacteria and Fungi both The DCL-type C domain catalyzes the condensation between a D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor. Conserved Protein Domain Family
Starter Bacteria dominate While standard C domains catalyze peptide bond formation between two amino acids, the (Starter) C-domain may instead acylate an amino acid with a fatty acid in the first module of NRPS. Conserved Protein Domain Family
Dual Bacteria and Fungi both Dual function E/C domains have both an epimerization and a DCL condensation activity. Dual E/C domains first epimerize the substrate amino acid to produce a D-configuration, then catalyze the condensation between the D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor. Conserved Protein Domain Family
CT Fungi only Unlike bacterial NRPS, which typically have specialized terminal thioesterase (TE) domains to cyclize peptide products, many fungal NRPSs employ a terminal condensation-like (CT) domain to produce macrocyclic peptidyl products. Conserved Protein Domain Family
CT-DCL Fungi only CT-DCL domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. This subtype is proposed in our paper. Conserved Protein Domain Family
CT-A Fungi only CT-Atypical (CT-A) domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. And it is always behind an ACP (acyl carrier protein) domain rather than a T domain. This subtype is proposed in our paper. Conserved Protein Domain Family
PS Bacteria dominate PS domain catalyzes Pictet-Spengler reaction. Literature
bL Bacteria dominate Beta-lactam (bL) C domain mediates an unusual cyclization to form beta-lactam rings. bL domain actually is a subtype of DCL domain. Conserved Protein Domain Family
X Bacteria dominate X domain is a catalytically inactive Condensation-like domain shown to recruit oxygenases to the NRPS. Conserved Protein Domain Family
Cyc Bacteria and Fungi both Cyc (heterocyclization) domains catalyze two separate reactions in the creation of heterocyclized peptide products in NRPS: amide bond formation followed by intramolecular cyclodehydration between a Cys, Ser, or Thr side chain and a carbonyl carbon on the peptide backbone to form a thiazoline, oxazoline, or methyloxazoline ring. Conserved Protein Domain Family
I Bacteria dominate Interface (I) domain plays a role in positioning the β-hydroxylase and the NRPS-bound amino acid substrate prior to hydroxylation. Literature
modAA Bacteria dominate The core function of modAA C domain is to catalyze the dehydration of beta-hydroxy amino acid (such as Ser, Thr) and form a dehydroamino acid. The derived functions include pyrrolizidine formation, conjugate addition instead of amideformation, pyrimidine formation, l-2-amino-4-methoxy-trans-3-butenoic acid formation, Side chain conjugate addition. Literature
Cglyc Bacteria dominate Glycopeptide condensation domain functions in peptide bond formation during glycopeptide antibiotic biosynthesis. MiBiG v2
Hybrid Bacteria and Fungi both C domain of hybrid polyketide synthetase/nonribosomal peptide synthetases (PKS/NRPSs) catalyze peptide bond formation within (usually) large multi-modular enzymatic complexes. Hybrid PKS/NRPS create polymers containing both polyketide and amide linkages. Conserved Protein Domain Family
FUM14 Fungi only C domain of NRPS similar to the ester-bond forming Fusarium verticillioides FUM14 protein. The module with FUM14 domain is always used iteratively. And ester-bond formation function is uncommon. Conserved Protein Domain Family
SgcC5 Bacteria and Fungi both SgcC5 is a NRPS C domain with ester- and amide- bond forming activity. It's hard to distingush between LCL and SgcC5 due to the high sequence similarity. Conserved Protein Domain Family
LCL-A Fungi only C domain with an atypical active site motif. Members of this subfamily typically have a non-canonical conserved SHXXXDX(14)Y motif which replaces HHXXXD motif typically found in the C domain. This subtype is named in our paper. Conserved Protein Domain Family
E Bacteria and Fungi both Epimerization (E) domains of NRPS flip the chirality of the end amino acid of a peptide being manufactured by the NRPS. Conserved Protein Domain Family

Note: In the NRPS-motif-Finder result, E domain is not considered to be a kind of C domain subtype. And E domain has 7 motifs while C domain has 10 motifs.

HMM files for C domain subtype classification can be found in here.

The raw sequences for HMM construction can be found in here.

Source code of NRPS-motif-Finder

There are two version of NRPS-motif-Finder applied in Matlab and Python.

We recommend Matlab version because it will be update frequently for solving bug. And Python version is stable version used in our online platform.

NRPS-motif-Finder-matlab-version

Matlab code of NRPS motif Finder.

NRPS-motif-Finder-Python-version

Python code of NRPS motif Finder.

Clone this wiki locally