Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4-8-2019 assignment #3

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 39 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Final Project
# DJM Final Project

link to final project presentation: https://docs.google.com/presentation/d/1Cj2cMghksd8KFY0QX1QT4nm68gL-UBkqfmLc4fXgGAA/edit?usp=sharing

## Instructions

Expand All @@ -13,21 +15,30 @@ are from public sources, or track them with [git-lfs](https://git-lfs.github.com

## Introduction

This is a final project for the [Comparative Genomics](https://github.com/Yale-EEB723/syllabus) seminar in the spring of 2019. This project (a very brief, ie 1-2 sentence, overview of the project)...
This is a final project for the [Comparative Genomics](https://github.com/Yale-EEB723/syllabus) seminar in the spring of 2019.
My project is to sequence and assemble the first darter genome using an Oxford Nanopore platform to collect long-read sequence data.

## The goal

The specific problem I/we sought to explore was ... Our goal was...
The long-term goals are to provide genomic resources for many future darter projects, including phylogenomic inference, functional genomics studies, and reference mapping. I will sequence a genome for *Etheostoma perlongum*, a species restricted to single lake in NC. The "sister species" of *E. perlongum* is *Etheostoma maculaticeps*, which can be found in the Waccamaw River that drains the lake. My specific goals are to use the genome to help assemble a large amount of ddRAD data that I have collected along a transect spanning the lake and the river. Our preliminary de novo assemblies and analyses of the ddRAD data indicate a steep cline between the lake and river, with some loci exhibiting particularly sharp breaks. My secondary goal is to map these outlier loci to the genomes to identify candidate regions that are maintaining the species boundary between the lake and river.

## The data

Description of data...

- Data source (simulated/ published/ unpublished?)
- Data structure
I have not collected the data yet, but plan to use an Oxford Nanopore PromethION platform to sequence one darter species to ~30x coverage.

## Background

Speciation commonly involves geographic isolation of lineages that limits or prevents gene flow<sup>1</sup>. The prevalence of allopatric speciation is exemplified by darters, a clade of ~250 North American freshwater fishes. Nearly all darter sister species pairs are isolated in different river drainages<sup>2</sup>. The only exception is *Etheostoma perlongum*. Endemic to the 36 km2 Lake Waccamaw in North Carolina, *E. perlongum* is phylogenetically nested within the widespread Etheostoma olmstedi (Fig. 1A). The closest *E. olmstedi* relatives of *E. perlongum* are found immediately outside of Lake Waccamaw in the Waccamaw River (Fig. 1B). A small spillway dam built in 1926 separates the lake from the river, but the dam is frequently inundated<sup>3</sup>. Despite the lack of dispersal barriers, my dissertation research indicates that there is a sharp geographic cline in allele frequencies between the lake and the river (Fig. 1C&D). Additionally, *E. perlongum* differs from *E. olmstedi* in the number of vertebrae, lateral line scales, body shape, and breeding habits, including an annual life cycle rarely observed in other darters<sup>4,5</sup>.

![](figures/Fig1.png)
*Figure 1.* Analyses of ddRAD-seq data for *E. olmstedi* and *E. perlongum*. A) Maximum likelihood phylogeny identified by IQTree, nodes <95% bootstrap support collapsed, red = *E. perlongum*, blue = *E. olmstedi* in the Waccamaw River. B) Sampling map, localities 1-4 in Lake Waccamaw, 5-11 in the Waccamaw River. C) Ancestry coefficients estimated using the “snmf” function in the R package LEA, locality numbers are indicated above the plot. D) PCA, with points color coded by species designation.

However, there is still debate whether *E. perlongum* is a distinct species or a lake ecomorph of *E. olmstedi*<sup>6</sup>. Lake Waccamaw is only 15,000-32,000 years old3, requiring very rapid ecological speciation between *E. perlongum* and *E. olmstedi*. Alternatively, the divergence between *E. olmstedi* and *E. perlongum* could be explained by local adaptation and/or phenotypic plasticity. Intraspecific lake-stream divergence is well-documented in many other fish species including sticklebacks<sup>7</sup>, minnows<sup>8</sup>, and cichlids<sup>9</sup>. If the divergence between *E. perlongum* and *E. olmstedi* represents intraspecific variation, we should observe similar genetic, phenotypic, and ecological divergence between other lake-stream populations of *E. olmstedi*. While there are many museum records of *E. olmstedi* in other lake-stream systems, there has been no detailed study of these populations.

**Research Questions:** Is *E. perlongum* a unique case of recent ecological speciation in a clade dominated by allopatric speciation? Or is differentiation between *E. perlongum* and *E. olmstedi* typical of intraspecific variation between other *E. olmstedi* lake-stream populations?



Motivation for the project....

How it fits in with other work...
Expand All @@ -37,6 +48,9 @@ What the reader needs to know to understand the project

## Methods

I plan to perform the DNA extraction and sequencing in mid-March in the lab of Trevor Krabbenhoft, a collaborator at the University of Buffalo (http://arts-sciences.buffalo.edu/biological-sciences/faculty/faculty-directory/trevor-krabbenhoft.html). I will use a variety of base-calling and assembly methods, including Scrappie and Canu (outlined in Jain et al. 2018). I will hopefully perform genome annotation using several additional sources of data, including exon-capture sequences and transcriptomic data from a closely related darter species.


## Results


Expand All @@ -51,3 +65,21 @@ What would you have done differently?
What are future directions this could go in?

## References

1. Coyne J.A. & Orr H.A. Speciation. Sunderland, MA: Sinauer Associates, Inc. 2004.

2. Near T.J., Bossu C.M., Bradburd G.S., Carlson R.L., Harrington R.C., Hollingsworth, P.R., Keck, B.P., & Etnier, D.A. Phylogeny and temporal diversification of Darters (Percidae: Etheostomatinae). Syst. Biol. 60:565–595. 2011.

3. Stager J.C. & Cahoon L.B. The Age and Trophic History of Lake Waccamaw, North Carolina. J. Elisha Mitchell Scientific Society. 103:1-13. 1987.

4. Shute, P.W., Shute J.R., Lindquist D.G. Age, growth, and early life history of the Waccamaw Darter, Etheostoma perlongum. Copeia. 1982:561–567. 1982.

5. Shute, J. R. A Systematic Evaluation of the Waccamaw Darter, Etheostoma perlongum Hubbs and Raney, with Comments on Relationships within the Subgenus Boleosoma Percidae: Etheostomatinae. University of Tennessee, Knoxville. 1984.

6. Berner D., Adams D.C., Grandchamp A.C., & Hendry A.P. Natural selection drives patterns of lake-stream divergence in stickleback foraging morphology. J. Evol. Biol. 21:1653–1665. 2008.

7. Collin H. & Fumagalli L. Evidence for morphological and adaptive genetic divergence between lake and stream habitats in European minnows (Phoxinus phoxinus, Cyprinidae). Mol. Ecol. 20:4490-4502. 2011

8. Theis A., Ronco F., Indermaur A., Salzburger W., Egger B. Adaptive divergence between lake and stream populations of an East African cichlid fish. Mol. Ecol. 23:5304-5322. 2014.

9. Wainwright P.C. & Richard B.A., Predicting patterns of prey use from morphology of fishes. EnvironBiolFish. 44:97-113. 1994.
40 changes: 40 additions & 0 deletions assemblies/v0.1/assemblyPipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Etheostoma perlongum genome assebly
*v0.1*

Running genome assembly with data from two Nanopore flow cells

Excluding reads < 1 kb

## pipeline

combine fastq files

`cat ~/scratch60/Eper_genome/nanoporeData/Eper_*/*/*/fastq_pass/*.fastq > Eper_nanopore.fastq`

convert fastq to fasta (we don't actually need to do this)

`paste - - - - < Eper_nanopore.fastq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > Eper_nanopore.fasta`

remove reads < 1kb

`bioawk -c fastx 'length($seq) > 1000{print "@"$name"\n"$seq"\n+\n"$qual}' Eper_nanopore.fastq > Eper_nanopore_1kb.fastq`

run minimap2 to map each read against all other reads

`minimap2 -x ava-ont -t 20 Eper_nanopore_1kb.fastq Eper_nanopore_1kb.fastq > Eper_nanopore_1kb_ovlp.paf 2> minimap.log`

run miniasm to assmble the reads using the mapping from previous step

`miniasm -f Eper_nanopore_1kb.fastq Eper_nanopore_1kb_ovlp.paf > Eper_nanopore_1kb.gfa 2> miniasm.log`

convert gfa file from miniasm to fasta format

`awk '/^S/{print ">"$2"\n"$3}' Eper_nanopore_1kb.gfa | fold > Eper_nanopore_1kb.gfa.fasta`

run minimap2 again to map reads to the assembled contigs

` minimap2 -x ava-ont -t 10 Eper_nanopore_1kb.gfa.fasta Eper_nanopore_1kb.fastq > Eper_nanopore_1kb_ovlp_genome.paf 2> minimap_genome.log`

run racon to polish the assembly and create consensus sequences

`racon -t 10 Eper_nanopore_1kb.fastq Eper_nanopore_1kb_ovlp_genome.paf Eper_nanopore_1kb.gfa.fasta > Eper_nanopore_1kb.racon.fasta`
9 changes: 9 additions & 0 deletions assemblies/v0.1/assemblyStats.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Assembly stats
v0.1

contig n50 = 872,654 bps
median contig length = 314,004 bp
shortest contig = 4,308 bp
longest contig = 3,968,385 bp
number of contigs = 1,416
total assembly length = 717,512,980 bp
Binary file added assemblies/v0.1/bandageGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading