GitHub - brandon-jernigan/Research-Evolutionary-Rate-Covariation-UA-2015-2017: Research in evolutionary rate covariation under Dr. Gutenkunst. Wrote senior thesis "Evolutionary Rate Covariation of Domain Families" on results.

brandon-jernigan / Research-Evolutionary-Rate-Covariation-UA-2015-2017 Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Research in evolutionary rate covariation under Dr. Gutenkunst. Wrote senior thesis "Evolutionary Rate Covariation of Domain Families" on results.

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ERC		ERC
Hessian_Data		Hessian_Data
__pycache__		__pycache__
domain_data		domain_data
domain_type_analysis		domain_type_analysis
heat_map_script		heat_map_script
interactions		interactions
pipeline_test		pipeline_test
.gitattributes		.gitattributes
Research_Notes.onepkg		Research_Notes.onepkg
Research_Notes.pdf		Research_Notes.pdf
Senior Thesis Final.docx		Senior Thesis Final.docx
aaml_template.ctl		aaml_template.ctl
domain_locations.p		domain_locations.p
ensembl_to_refseq.tsv		ensembl_to_refseq.tsv
intermediate.p		intermediate.p
paml_1.sh		paml_1.sh
paml_2.sh		paml_2.sh
paml_3.sh		paml_3.sh
paml_4.sh		paml_4.sh
pfam_domain_names.p		pfam_domain_names.p
pfam_match.p		pfam_match.p
pipeline.py		pipeline.py
pipeline_test_script.py		pipeline_test_script.py
readme.txt		readme.txt
ref_seq_to_gene_name.txt		ref_seq_to_gene_name.txt
refseq_to_uniprot.tsv		refseq_to_uniprot.tsv
swiss_to_name.txt		swiss_to_name.txt
vert_genome_key.txt		vert_genome_key.txt
vert_tree_topology.txt		vert_tree_topology.txt
yeast_genome_key.txt		yeast_genome_key.txt
yeast_tree_topology.txt		yeast_tree_topology.txt

Repository files navigation

Author: Brandon Jernigan
Email: BrandonJernigan@email.arizona.edu
Date: 6/26/2017

Thesis Abstract for "Evolutionary Rate Covariation of Domain Families":

Evolutionary rate covariation (ERC) is a phylogenetic measure of the evolutionary relationship between pairs of proteins. As proteins evolve over time, their rate of evolution (dN/dS) may vary. ERC measures how closely the evolutionary rates of two proteins match over a phylogeny. Proteins known to interact directly or indirectly tend to have higher ERC, because they typically experience similar evolutionary pressures within each lineage. Much is known about ERC at the whole protein level, but little is known at the domain level. Because individual functions of a protein are often performed by distinct domains, a focus on the domain level is expected to provide a clearer relationship between specific functions and ERC. Here we investigate ERC within and between domain families. In particular, we identify domain families with high ERC and investigate potential biochemical explanations.


Files:
Senior Thesis Final.docx is my biochemistry senior thesis on this project "Evolutionary Rate Covariation of Domain Families"

Research_Notes.pdf and Research_Notes.onepkg are 2 formats of my notes throughout the project. The .onepkg is easier to navigate, but only works with One Note.

pipeline.py is the pipeline run after inparanoid which takes its results and produces a list of phylogenetic trees with branch lengths that the script ERC_concise.R can use to calculate ERC values. This can be split into domains or can be for the whole protein. Each funciton in the pipeline has a description within the script.

pipeline_test_script.py is a python unit test script that ensures the major functions in pipeline.py work the way they are supposed to even when they get altered.

paml_1.sh (and 2, 3, 4) is used to run paml after alignments, tree files, and control files are created for each protein.

ref_seq_to_gene_name.txt, refseq_to_uniprot.tsv, and swiss_to_name.txt, ensembl_to_refseq.tsv are used to translate between gene naming conventions in scripts

aaml_template.ctl is a template PAML control file used in pipeline.py

vert_genome_key.txt and yeast_genome_key.txt both give additonal information removed from the genome titles to make processing easier

vert_tree_topology.txt and yeast_tree_topology are the species trees used in PAML

.p files are intermediate files from pipeline.py to save run time


Folders:

pipeline_test: folder used to store data files for using pipeline_test_script.py, which tests pipeline.py

domain_data (see 7-17-2016 in Research_Notes): used to apply Brian's domain annotations from his previous project.

domain_type_analysis (see 2-13-2017, 4-3-2017 in Research_Notes): Analysis of ERC values between domain types

ERC: Location of ERC calculating script and results for proteins and domains.

heat_map_script (see 8-8-2016 in Research_Notes): contains script used to calculate the ERC values of 25 amino acid long windows which can then be turned into a heat map of the ERC between two amino acid sequences

Hessian_Data (see 7-17-2016 in Research_Notes)

interactions (see 2-13-2017 in Research_Notes): script used to identify those proteins known to interact and look at their ERC values