Skip to content

Bioinformatic tool for the accurate identification, characterization, quantitation and annotation of MASP molecules in T. cruzi.

License

Notifications You must be signed in to change notification settings

BuscagliaLab/Disruptomics-MASP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Annotation and classification of Mucin Associated Surface Proteins (MASP) in Trypanosoma cruzi

📍What is Disruptomics?

We herein developed a bioinformatic protocol for the accurate identification, characterization, quantitation and annotation of MASP molecules.

📝NOTE

From GENOME

To start you must run your genome sequence in getorf EMBOSS: (we recommend these parameters)

getorf -minsize 120 -maxsize 20100 -find 1

The output obtained in EMBOSS will be the input of the algorithm.

From PROTEOME

⚠️In case you want to run a protein fasta file, you must ensure that the header of the sequences have the same format as EMBOSS.⚠️ Example: >XX_2517_1_1_1 [224 - 382]

💻How to run Disruptomics?

To execute the algorithm, download the MASP-algorithm.zip file and save it in your Downloads folder. Unzip the file in your desired location, creating a folder named MASP-Algorithm that contains the Python script and two HMM matrices. Open a terminal and navigate to this folder, either manually or by right-clicking the folder and selecting “Open in Terminal” for convenience.

Make sure you download the ZIP folder containing dependent files and the algorithm inside. Unzip the file, and NOT DELETE anything that is inside of the folder.

Then you run:

python3 MASP-AnnotationAlgorithm.py

First, the user has to select the directory where all outputs will be stored (Fig.A). Next, the user should select the multiFASTA file containing the protein sequences, which can also be the output file generated by the EMBOSS GetORF function (Fig.B). Once this is completed, one prompts will appear in the terminal, where the user can enter the name of the strain to be analyzed, which will be used in the names of the outputs (Fig.C).

User interface and workflow for executing the MASP annotation algorithm.

(A) Inside the extracted ZIP folder, three essential files are displayed: the main Python script and two HMM profiles. (B) Users can open the working directory in the terminal by right-clicking the folder and selecting "Open in Terminal", which automatically sets the terminal to the correct directory. (C) In the terminal, users execute the algorithm by running the Python script, specifying required inputs. (D) The user selects the output directory where all result files will be stored, with the chosen folder path displayed in the interface. The protein multiFASTA file containing the sequences to be analyzed is selected, with the file path shown at the bottom. (E) Terminal prompt for entering the name of the strain to be analyzed, enabling the customization of output files.

The algorithm will then begin running, displaying updates on its progress like this:

All N-terminal of the entered sequences were analyzed. 
All C-terminal of the entered sequences were analyzed. 
Analyzing and annotating MASP sequences
Searching for chimeric sequences
All sequences were classified
Selecting sequences according to hierarchical ranking
Finished selecting sequences according to hierarchical ranking
Once the classification and annotation of MASPs is finished, the following messages will be displayed on the terminal:
FASTA files generated by prediction of the algorithm.
GFF file generated: /home/user/Selected_Folder/MASP_Strain_XX_sequences.gff
The information has been stored in /home/user/Selected_Folder/README_Strain_XX.txt 

✔️Dependencies

You should know that this tool, in order to operate correctly, needs some libraries installed previously.

--> Pandas version 2.8.0 and upwards

--> PyHMMER from https://pyhmmer.readthedocs.io/en/stable/

--> tkinter (is the default Python interface to the Tk GUI toolkit)

--> In some cases, depending on the computer, installation of Jinja2 may be required. https://jinja.palletsprojects.com/en/stable/

About

Bioinformatic tool for the accurate identification, characterization, quantitation and annotation of MASP molecules in T. cruzi.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published