Skip to content

RanSEPs provides a framework for genome re-annotation and novel small proteins detection adjusting the search to different genomic features that govern protein-coding capabilities

License

Notifications You must be signed in to change notification settings

samuelmiver/RanSEPs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


RanSEPs provides a framework for bacterial genome re-annotation and novel small proteins (SEPs) detection adjusting the search to different genomic features that govern protein-coding capabilities.

How does RanSEPs work?

Original publication with full description of methods can be found here.

Preparation

RanSEPs requires:

  • Python: version 2.7 or higher. We have not tested it in version 3.
  • Propy: tool to compute protein features. Instructions for downloading here.
  • Blast: database generation requires to run this program locally. Find information for downloadind and installation here.

Installation

Specific libraries are required by RanSEPs to compute certain processes in their predictions. We provide a requirements file to install everything at once. To do so, you will need first to have pip installed and then run:

sudo apt-get install python-pip    # if you need to install pip
pip install -r requirements.txt

Then, move to RanSEPs directory and install the program typing:

sudo python setup.py install

After this you will have access to the program just typing ranseps in your command line.

Usage

In order to run a prediction you will only a pair of files:

  • Genome of reference in fasta file.
  • The annotated CDS in nucleotidic sequences annotated in that genome in fasta.

Then just:

# Custom CDS
ranseps -g <path/to/your/fasta_or_genbank> -c <path/to/your/cds/file>
# Using the CDS from a genbank
ranseps -g <path/to/your/genbank>

This will run a simple search for proteins with size higher than 10 amino acids. However, RanSEPs allows multiple sets of parameters to explore and find the best set for your organism of interest. To check them execute:

ranseps -h

Output

Once the program has run without problems (we really hope it!), you will find in your selected directory:

  • A tab delimited file with all the sequences, their location, RanSEPs score and standard deviation and nt and aa sequences. We recommend a threshold of >= 0.5 to trust a SEP and >=0.85 for standard proteins.
  • Precission-recall and ROC curves to assess the accuracy of your prediction.
  • The weights and errors for each feature considered.

If this output is not enough, RanSEPs will generate a intermediary_folder including all databases, annotation files, amino acidic and nucleotidic sequences in fasta files and classifiers, features and statistics for each classification subprocess (more information in online methods of the original publication). You can remove this folder safely if you have enough with the default results.

RanSEPs as a python package

You can import RanSEPs to use it as an implemented function in any of your python scripts. To do so, just follow the previous installation steps and import the main function using:

from ranseps.run_ranseps import run_ranseps

Then you will be able to run the tool in any script using:

run_ranseps(<path/to/your/genome> , <path/to/your/cds/file>)

All the additional arguments present in the desktop version are available in the python function, to check the documentation interactively in python interpreter you can use:

help(run_ranseps)

Versions

  • v_1: added predictor of pseudogenes based in homology
  • v_2:
    • pseudorandomized mode
    • fixed set sizes options
    • average probability per prediction optimized
    • autonegative set

Contact

This project has been fully developed at Centre for Genomic Regulation at the group of Design of Biological Systems.

If you experience any problem at any step involving the program, you can use the 'Issues' page of this repository or contact:

Miravet-Verde, Samuel
Lluch-Senar, Maria
Serrano, Luis

License

RanSEPs is under a common GNU GENERAL PUBLIC LICENSE. Plese, check LICENSE for further information.

[2018] - Centre de Regulació Genòmica (CRG) - All Rights Reserved

About

RanSEPs provides a framework for genome re-annotation and novel small proteins detection adjusting the search to different genomic features that govern protein-coding capabilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages