Skip to content

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

License

Notifications You must be signed in to change notification settings

schatzlab/appliedgenomics2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JHU EN.601.449/EN.601.649: Computational Genomics: Applied Comparative Genomics

Prof: Michael Schatz (mschatz @ cs.jhu.edu)
TA: Matthew Nguyen (mnguye99 @ jh.edu)
Class Hours: Monday + Wednesday @ 3:00p - 4:15p Hodson 316
Schatz Office Hours: By appointment
Nguyen Office Hours: TBD and by appointment

The primary goal of the course is for students to be grounded in the fundamental theory and applications to leave the course empowered to conduct independent genomic analyses. We will study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. A major focus will be on deep learning and machine learning to tackle these problems. The grading will be based on assignments, a midterm exam, class presentations, and a significant class project. There are no formal course prerequisites, although the course will require familiarity with UNIX scripting and/or programming to complete the assignments and course project.

Prerequisites

Course Resources:

Related Courses & Readings

Related Textbooks

Schedule

Class Date Day Topic Assignments Readings
1 26-Aug Mon Introduction Sign Up for Piazza * Molecular Structure of Nucleic Acid (Watson and Crick, 1953, Nature)
* Biological data sciences in genome research (Schatz, 2015, Genome Research)
* Big Data: Astronomical or Genomical? (Stephens et al, 2015, PLOS Biology)
2 28-Aug Wed Genomic Technologies, kmers Assignment 1 * Coming of age: ten years of next-generation sequencing technologies (Goodwin et al, 2016, Nature Reviews Genetics)
* Guide to k-mer approaches for genomics across the tree of life (Jenike et al., 2024, arXiv)
* 2-Sep Mon $${\color{red}\text{Labor Day}}$$
3 4-Sep Wed Assembly, WGA * Toward simplifying and accurately formulating fragment assembly. (Myers, 1995, J. Comp. Bio.)
* Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008, Genome Research)
* SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing (Bankevich, et al. 2012, J Comput Biol)
* MUMmer: Alignment of Whole Genomes (Delcher et al, 1999, NAR)
4 9-Sep Mon Human Genome, Long Reads Assignment 2 * Initial sequencing and analysis of the human genome (International Human Genome Sequencing Consortium, 2001, Nature)
* FALCON-unzip: Phased diploid genome assembly with single-molecule real-time sequencing (Chin et al, 2016, Nature Methods)
* MHAP: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing (Berlin et al, 2015, Nature Biotech)
* Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation (Koren et al, 2017, Genome Research)
* Telomere-to-telomere assembly of diploid chromosomes with Verkko (Rautiainen et al, 2023, Nature Biotechnology)
* Piercing the dark matter: bioinformatics of long- range sequencing and mapping (Sedlazeck et al, 2018, Nature Reviews Genetics)
5 11-Sep Wed T2T, HPRC, pangenome * The complete sequence of a human genome (Nurk et al, Science 2012)
* Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing (Kovaka et al, 2023, Nature Methods
* A draft human pangenome reference (Liao et al, 2023, Nature)
* Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References (Taylor et al., 2024, Annual Review of Genomics and Human Genetics)
6 16-Sep Mon Read Mapping * How to map billions of short reads onto genomes (Trapnell and Salzberg, 2009, Nature Biotech)
* Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (Langmead et al, 2009, Genome Biology)
* BWA-MEM: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (Li, 2013, arXiv)
* Sapling: Accelerating Suffix Array Queries with Learned Data Models (Kirsche et al, 2020, bioRxiv
7 18-Sep Wed Variant Analysis * Haplotype-based variant detection from short-read sequencing (Garrison and Marth, arXiv, 2012)
* The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data (McKenna et al, 2010, Genome Research)
* A universal SNP and small-indel variant caller using deep neural networks (Poplin et al, 2018, Nature Biotechnology
* SAM/BAM/Samtools: The Sequence Alignment/Map format and SAMtools (Li et al, 2009, Bioinformatics)
* IGV: Integrative genomics viewer (Robinson et al, 2011, Nature Biotech)
8 23-Sep Mon Human evolution Assignment 3 * An integrated map of genetic variation from 1,092 human genomes (1000 Genomes Consortium, 2012, Nature)
* Analysis of protein-coding genetic variation in 60,706 humans (Let et al, 2016, Nature)
* A Draft Sequence of the Neandertal Genome (Green et al. 2010, Science)
* Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals (Vernot et al. 2016. Science)
* Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (Schatz et al, 2022, Cell Genomics)
9 25-Sep Wed Intro to ML: PCA, Clustering, tSNE, UMAP, Decision Trees, NN * What are decision trees? (Kingsford and Salzberg, 2008, Nature Biotechnology)
* What is a hidden Markov model? (Eddy, 2004, Nature Biotechnology)
* Deep learning in biomedicine (Wainberg et al, 2018, Nature Biotechnology)
* Visualizing Data Using t-SNE
10 30-Sep Mon CNN + DeepVariant * ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al., 2012, NIPS)
11 2-Oct Wed Functional Analysis 1: Annotation * BLAST: Basic Local Alignment Search Tool
* Glimmer: Microbial gene identification using interpolated Markov models
* MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
* BEDTools: a flexible suite of utilities for comparing genomic features (Quinlan & Hall, 2010, Bioinformatics)
12 7-Oct Mon Functional Analysis 2: RNA-seq Assignment 4 * RNA-Seq: a revolutionary tool for transcriptomics (Wang et al, 2009. Nature Reviews Genetics)
* Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Trapnell et al, 2012, Nature Protocols)
* Salmon provides fast and bias-aware quantification of transcript expression (Patro et al, 2017, Nature Methods)
* Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications (Krueger and Andrews, 2011, Bioinformatics)
13 9-Oct Wed Functional Analysis 3: Methyl-seq, Chip-seq, and Hi-C * ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions (Furey, 2012, Nature Reviews Genetics)
* PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Rozowsky et al. 2009. Nature Biotech)
* Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome (Lieberman-Aiden et al, 2009, Science)
14 14-Oct Mon Functional Analysis 4: Regulatory States, ENCODE, GTEx, RoadMap Project proposal * An integrated encyclopedia of DNA elements in the human genome (The ENCODE Project Consortium, Nature, 2012)
* Genetic effects on gene expression across human tissues (GTEx Consortium, Nature, 2017)
* Integrative analysis of 111 reference human epigenomes (Roadmap Epigenome Consortium, Nature, 2015)
* ChromHMM: automating chromatin-state discovery and characterization (Ernst & Kellis, 2012, Nature Methods)
* Segway: Unsupervised pattern discovery in human chromatin structure through genomic segmentation (Hoffman et al, 2012, Nature Methods)
15 16-Oct Wed Functional Analysis 5: Single Cell Genomics * Ginkgo: Interactive analysis and assessment of single-cell copy-number variations (Garvin et al, 2015, Nature Methods)
* The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells (Trapnell et al, Nature Biotech, 2014)
* Eleven grand challenges in single-cell data science (Lahnemann et al, Genome Biology, 2020)
16 21-Oct Mon Transformers Assignment 5 * Attention is all you need (Vaswani et al. 2017, arXiv)
17 23-Oct Wed Transformers + Enformer * Effective gene expression prediction from sequence by integrating long-range interactions (Avsec et al., 2021, Nature Methods)
* Personal transcriptome variation is poorly explained by current genomic deep learning models
(Huang et al., 2023, Nature Genetics)

* Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings (Sasse et al., 2023, Nature Genetics)
18 28-Oct Mon Other applications of DL in Genomics Prelim report assigned * Deep Learning Sequence Models for Transcriptional Regulation (Sokolova et al., 2024, Annual Reviews of Genomics and Human Genetics)
* AlphaFold (Jumper et al, 2021, Nature)
19 30-Oct Wed Midterm review
20 4-Nov Mon Midterm [In class exam]
21 6-Nov Wed Human Genetic Diseases * Genome-Wide Association Studies (Bush & Moore, 2012, PLOS Comp Bio)
* The contribution of de novo coding mutations to autism spectrum disorder (Iossifov et al, 2014, Nature)
22 11-Nov Mon Metagenomics Prelim Report Due; Final Report Assigned * Kraken: ultrafast metagenomic sequence classification using exact alignments (Wood and Salzberg, 2014, Genome Biology)
* Chapter 12: Human Microbiome Analysis (Morgan and Huttenhower)
23 13-Nov Wed $${\color{red}\text{No class BIODATA24}}$$
24 18-Nov Mon Cancer Genomics * The Hallmarks of Cancer (Hanahan & Weinberg, 2000, Cell)
* Evolution of Cancer Genomes (Yates & Campbell, 2012, Nature Reviews Genetics)
* Comprehensive molecular portraits of human breast tumours (TCGA, 2012, Nature)
25 20-Nov Wed In class project presentation
* 25-Nov Mon $${\color{red}\text{Thanksgiving Break}}$$
* 27-Nov Wed $${\color{red}\text{Thanksgiving Break}}$$
26 2-Dec Mon In class project presentation
27 4-Dec Wed In class project presentation
* 16-Dec Mon Final Report Due Final Report Due

About

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published