Skip to content

Latest commit

 

History

History
118 lines (82 loc) · 4.16 KB

README.md

File metadata and controls

118 lines (82 loc) · 4.16 KB

Microbiome Phylogenetic Tree Pipeline

A bioinformatics pipeline for microbiome data analysis using phylogenetic trees.

To get started, take a look at our design document under the 'Wiki' tab to learn more about the project.

Introduction

Understanding phylogenetic relationships between different species is crucial for evolutionary studies. Reconstructing the phylogenetic species tree, a branching diagram, is particularly useful in inferring evolutionary relationships. For example, the tree-of-life provides a remarkable view of organizing principles of the biological world. So, the exact species tree to be reconstructed is necessary, but the process of reconstructing the species or gene tree is very tedious.

Here, we developed an easy-to-use pipeline that conveniently and effiecently reconstructs species trees.

Pipeline Workflow

Proposed solution PNG

Features

  • Inputs only include species names.
  • One python script to build newick file.
  • One python scirpt to visualize the tree.
  • View trees using the phylo package.

Files Included in the Repo

  • 16SFastaData.txt

    • FASTA format output file of the test data (as a txt file)
    • Found in SampleOutputs
  • 16Sout.fasta

    • FASTA format output file of the test data (as a FASTA file)
    • Found in SampleOutputs
  • ETE_code.txt

    • Code to follow to download the ETE toolkit
  • PhyloPipeline

    • Main python script for the pipeline
  • TreeVisualization

    • Python script to visualize the tree
  • seqs.afa

    • Multiplie sequence alignment output file of the test data (generated by MUSCLE)
    • Found in SampleOutputs
  • #_taxa.txt

    • Multiple txt test files of taxonomic names (10, 20, 100)
    • Found in SampleTest
  • tree_file

    • Newick file of the generated tree from the test data
    • Found in SampleOutputs

Software Tools Required

Install

In order to run this code from your working directory, use this git command to clone this repository to your workspace:

git clone https://github.com/rmormando/PhyloTree_Project.git

Then, change working directories in order to access all files from the cloned repo:

cd PhyloTree_Project

Directions

To utilize the pipeline Python, MUSCLE, and FastTree must be installed on the local machine or server of your chosing.

To use the ETE toolkit for visualization follow the steps outlined in:

ETE.txt

To use plottree for visualization run this line of code on the command line:

pip install plottree

Use the link found above to learn more.

1. Download the sample data set or use your own

Must be a text file of taxonomy names separated by line.

#_taxa.txt

Example input files provided in the repo (SampleTests folder).

2. Run through the pipeline

Run through the python script with the txt file to access all of the components of the pipeline:

python3 PhlyoPipeline.py

This single python script file will retrieve the 16s raw reads from NCBI's public database in FASTA format, will create a sequence alignment of the generated FASTA file using MUSCLE, and will then create a tree in Newick format. You can then visualize the newick file on a tree viewer online (we recommend using iTOL), but for your convenience the file named:

python3 TreeVisualization.py

will create a jpeg of the tree with the branch length using the Phlyo package from BioPython.

Usage

This pipeline has many applications. The Dong and Gao labs at Loyola Universtiy Chicgao challenged us to create a pipeline that will take in a list of taxa names, generated by previous metagenomic analysis of organisms, and develop a way to create a tree from them. There are many previously designed software and tools that account for this solution, however our approach exaggerates the need for efficency while using previously made tools.