Annotation and classification of Mucin Associated Surface Proteins (MASP) in Trypanosoma cruzi

📍What is Disruptomics?

We herein developed a bioinformatic protocol for the accurate identification, characterization, quantitation and annotation of MASP molecules.

📝NOTE

From GENOME

To start you must run your genome sequence in getorf EMBOSS: (we recommend these parameters)

getorf -minsize 120 -maxsize 20100 -find 1

The output obtained in EMBOSS will be the input of the algorithm.

From PROTEOME

⚠️In case you want to run a protein fasta file, you must ensure that the header of the sequences have the same format as EMBOSS.⚠️ Example: >XX_2517_1_1_1 [224 - 382]

💻How to run Disruptomics?

To execute the algorithm, download the MASP-algorithm.zip file and save it in your Downloads folder. Unzip the file in your desired location, creating a folder named MASP-Algorithm that contains the Python script and two HMM matrices. Open a terminal and navigate to this folder, either manually or by right-clicking the folder and selecting “Open in Terminal” for convenience.

Make sure you download the ZIP folder containing dependent files and the algorithm inside. Unzip the file, and NOT DELETE anything that is inside of the folder.

Then you run:

python3 MASP-AnnotationAlgorithm.py

First, the user has to select the directory where all outputs will be stored (Fig.A). Next, the user should select the multiFASTA file containing the protein sequences, which can also be the output file generated by the EMBOSS GetORF function (Fig.B). Once this is completed, one prompts will appear in the terminal, where the user can enter the name of the strain to be analyzed, which will be used in the names of the outputs (Fig.C).

(A) Inside the extracted ZIP folder, three essential files are displayed: the main Python script and two HMM profiles. (B) Users can open the working directory in the terminal by right-clicking the folder and selecting "Open in Terminal", which automatically sets the terminal to the correct directory. (C) In the terminal, users execute the algorithm by running the Python script, specifying required inputs. (D) The user selects the output directory where all result files will be stored, with the chosen folder path displayed in the interface. The protein multiFASTA file containing the sequences to be analyzed is selected, with the file path shown at the bottom. (E) Terminal prompt for entering the name of the strain to be analyzed, enabling the customization of output files.

The algorithm will then begin running, displaying updates on its progress like this:

All N-terminal of the entered sequences were analyzed. 
All C-terminal of the entered sequences were analyzed. 
Analyzing and annotating MASP sequences
Searching for chimeric sequences
All sequences were classified
Selecting sequences according to hierarchical ranking
Finished selecting sequences according to hierarchical ranking
Once the classification and annotation of MASPs is finished, the following messages will be displayed on the terminal:
FASTA files generated by prediction of the algorithm.
GFF file generated: /home/user/Selected_Folder/MASP_Strain_XX_sequences.gff
The information has been stored in /home/user/Selected_Folder/README_Strain_XX.txt

✔️Dependencies

You should know that this tool, in order to operate correctly, needs some libraries installed previously.

--> Pandas version 2.8.0 and upwards

--> PyHMMER from https://pyhmmer.readthedocs.io/en/stable/

--> tkinter (is the default Python interface to the Tk GUI toolkit)

--> In some cases, depending on the computer, installation of Jinja2 may be required. https://jinja.palletsprojects.com/en/stable/

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
images		images
LICENSE		LICENSE
MASP-Algorithm.zip		MASP-Algorithm.zip
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotation and classification of Mucin Associated Surface Proteins (MASP) in Trypanosoma cruzi

📍What is Disruptomics?

📝NOTE

💻How to run Disruptomics?

✔️Dependencies

About

Releases

Packages

License

BuscagliaLab/Disruptomics-MASP

Folders and files

Latest commit

History

Repository files navigation

Annotation and classification of Mucin Associated Surface Proteins (MASP) in Trypanosoma cruzi

📍What is Disruptomics?

📝NOTE

💻How to run Disruptomics?

✔️Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages