Microbiome Phylogenetic Tree Pipeline

A bioinformatics pipeline for microbiome data analysis using phylogenetic trees.

To get started, take a look at our design document under the 'Wiki' tab to learn more about the project.

Introduction

Understanding phylogenetic relationships between different species is crucial for evolutionary studies. Reconstructing the phylogenetic species tree, a branching diagram, is particularly useful in inferring evolutionary relationships. For example, the tree-of-life provides a remarkable view of organizing principles of the biological world. So, the exact species tree to be reconstructed is necessary, but the process of reconstructing the species or gene tree is very tedious.

Here, we developed an easy-to-use pipeline that conveniently and effiecently reconstructs species trees.

Pipeline Workflow

Features

Inputs only include species names.
One python script to build newick file.
One python scirpt to visualize the tree.
View trees using the phylo package.

Files Included in the Repo

16SFastaData.txt
- FASTA format output file of the test data (as a txt file)
- Found in SampleOutputs
16Sout.fasta
- FASTA format output file of the test data (as a FASTA file)
- Found in SampleOutputs
ETE_code.txt
- Code to follow to download the ETE toolkit
PhyloPipeline
- Main python script for the pipeline
TreeVisualization
- Python script to visualize the tree
seqs.afa
- Multiplie sequence alignment output file of the test data (generated by MUSCLE)
- Found in SampleOutputs
#_taxa.txt
- Multiple txt test files of taxonomic names (10, 20, 100)
- Found in SampleTest
tree_file
- Newick file of the generated tree from the test data
- Found in SampleOutputs

Software Tools Required

Linux/Unix/Mac OS
Python
- os: https://docs.python.org/3/library/os.html
- Biopython (Phlyo): https://biopython.org/wiki/Phylo
ETE: http://etetoolkit.org/
plottree: https://github.com/iBiology/plottree
MUSCLE: http://www.drive5.com/muscle/
FastTree: http://www.microbesonline.org/fasttree/

Install

In order to run this code from your working directory, use this git command to clone this repository to your workspace:

git clone https://github.com/rmormando/PhyloTree_Project.git

Then, change working directories in order to access all files from the cloned repo:

cd PhyloTree_Project

Directions

To utilize the pipeline Python, MUSCLE, and FastTree must be installed on the local machine or server of your chosing.

To use the ETE toolkit for visualization follow the steps outlined in:

ETE.txt

To use plottree for visualization run this line of code on the command line:

pip install plottree

Use the link found above to learn more.

1. Download the sample data set or use your own

Must be a text file of taxonomy names separated by line.

#_taxa.txt

Example input files provided in the repo (SampleTests folder).

2. Run through the pipeline

Run through the python script with the txt file to access all of the components of the pipeline:

python3 PhlyoPipeline.py

This single python script file will retrieve the 16s raw reads from NCBI's public database in FASTA format, will create a sequence alignment of the generated FASTA file using MUSCLE, and will then create a tree in Newick format. You can then visualize the newick file on a tree viewer online (we recommend using iTOL), but for your convenience the file named:

python3 TreeVisualization.py

will create a jpeg of the tree with the branch length using the Phlyo package from BioPython.

Usage

This pipeline has many applications. The Dong and Gao labs at Loyola Universtiy Chicgao challenged us to create a pipeline that will take in a list of taxa names, generated by previous metagenomic analysis of organisms, and develop a way to create a tree from them. There are many previously designed software and tools that account for this solution, however our approach exaggerates the need for efficency while using previously made tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Microbiome Phylogenetic Tree Pipeline

To get started, take a look at our design document under the 'Wiki' tab to learn more about the project.

Introduction

Pipeline Workflow

Features

Files Included in the Repo

Software Tools Required

Install

Directions

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Microbiome Phylogenetic Tree Pipeline

To get started, take a look at our design document under the 'Wiki' tab to learn more about the project.

Introduction

Pipeline Workflow

Features

Files Included in the Repo

Software Tools Required

Install

Directions

Usage