Skip to content

HMM models of the 2A peptide and DB search results

Notifications You must be signed in to change notification settings

rnabioco/2a-peptide-search

Repository files navigation

2a-peptide-search

Introduction

The 2A peptide is a short (~15 residue) cis-acting oligopeptide that causes the ribosome to skip a peptide bond between a Gly-Pro dipeptide. 2A peptides are thought to interact with specific residues in the ribosome exit channel. Previous studies described 2A and 2A-like peptides that are broadly distributed in other RNA viruses, and typically occur between proteins with specific functions, including viral coat proteins and enzymes involved in viral replication.

2A peptides have conserved sequence features making them amenable to construction of profile hidden Markov models, which capture sequence elements in a statistical model suitable for searching large protein sequence databases. In particular, the Pro-Gly-Pro residues at its C-terminus are critical for 2A skipping activity.

Data availability

Alignments and profile HMMs of class 1 and 2 2A peptides are in curated-models/.

hmmsearch results with the models against major protein databases are in db-searches.

N.B. searches with the 2A models will identify e.g. partial matches between the class 2 model and class 1 sequences and vice versa, due to similarities between the models (i.e., the C-terminal PGP

Pfam onboarding

I have tried to no avail to incorporate these models into Pfam, now hosted by Interpro. They claim to not be able to build HMM models that are specific (due to the sequences being short), despite them starting with the collection of seqeuences that are identified using the models provided here 🤦‍♂️.

Results

We performed a systematic search for 2A peptides using the hmmer software. Starting from a small set of previously described 2A peptides in picornaviruses, we iteratively built and searched three protein databases (UniProt, UniParc, and MGnify), identifying thousands of 2A peptides.

Class 1 2A peptides

Visual inspection of multiple sequence alignments revealed a novel 2A peptide with features resembling the original 2A peptide. The original, class 1 sequences contain some key features:

  • a stretch of leucine residues at the N-terminus
  • a central region with a conserved GDVE motif
  • C-terminal NPGP redidues, with skipping occuring between the Gly-Pro residues.

Logo of 2A peptide Class 1 sequences

Class 2 2A peptides

Class 2 2A peptides are similar to class 1 with some key distinctions.

  • An invariant tryptophan residue at the N-terminus.
  • No runs of leucine codons, but specific conserved residues in the central region similar to class 1 (e.g., GDVE in class 1, EEGIE in class 2)
  • Examples with PNPGP and PHPGP residues at the C-terminus.

Logo of 2A peptide Class 2 sequences

Methods

Protein databases

  • Uniprot - highly curated reference proteins, pan-organism
  • Uniparc - non-redundant, uncurated, pan-organism
  • MGnify - sequences from enrivonmental samples
  • Reference Proteomes — a subset of Uniprot used to build models for Pfam.
  • IMGVR - large database of cultivated an uncultivated viruses

Multiple sequence alignments

We created multiple sequence alignments using hmmalign and visualized them with Jalview. Neighbor-joing trees were calulcated for sequences in the MSA and the MSA was tree-sorted to identify sequence similarities. Seed alignments were also calculated by eliminating redundant sequences from the aligment. Sequence logos in img/ were created with Skylign.

Mutliple sequence alignments of searches with the 2A models against proteome databases are in db-searches/.

References

  • https://en.wikipedia.org/wiki/2A_self-cleaving_peptides

  • Wang Y, Wang F, Wang R, Zhao P, Xia Q. 2A self-cleaving peptide-based multi-gene expression system in the silkworm Bombyx mori. Sci Rep. 2015 Nov 5;5:16273. doi: 10.1038/srep16273. PMID: 26537835; PMCID: PMC4633692.

  • Liu Z, Chen O, Wall JBJ, Zheng M, Zhou Y, Wang L, Vaseghi HR, Qian L, Liu J. Systematic comparison of 2A peptides for cloning multi-genes in a polycistronic vector. Sci Rep. 2017 May 19;7(1):2193. doi: 10.1038/s41598-017-02460-2. PMID: 28526819; PMCID: PMC5438344.

  • Sharma P, Yan F, Doronina VA, Escuin-Ordinas H, Ryan MD, Brown JD. 2A peptides provide distinct solutions to driving stop-carry on translational recoding. Nucleic Acids Res. 2012 Apr;40(7):3143-51. doi: 10.1093/nar/gkr1176. Epub 2011 Dec 2. PMID: 22140113; PMCID: PMC3326317.

  • Luke GA, de Felipe P, Lukashev A, Kallioinen SE, Bruno EA, Ryan MD. Occurrence, function and evolutionary origins of '2A-like' sequences in virus genomes. J Gen Virol. 2008 Apr;89(Pt 4):1036-1042. doi: 10.1099/vir.0.83428-0. PMID: 18343847; PMCID: PMC2885027.

  • de Lima JGS, Lanza DCF. 2A and 2A-like Sequences: Distribution in Different Virus Species and Applications in Biotechnology. Viruses. 2021 Oct 26;13(11):2160. doi: 10.3390/v13112160. PMID: 34834965; PMCID: PMC8623073.

  • Nibert, Max L. “'2A-like' and 'shifty heptamer' motifs in penaeid shrimp infectious myonecrosis virus, a monosegmented double-stranded RNA virus.” The Journal of general virology vol. 88,Pt 4 (2007): 1315-1318. doi:10.1099/vir.0.82681-0