Training material and presentations about getting started with larger data. The focus is on whole genome sequencing data but key enabling technologies and ideas are generic.
- Conda for environment and dependency management
- Github repos (and gists) for project code, metadata, narrative and results (small data)
- Jupyter for authoring
- Modern idioms for tabular data manipulation, in memory and beyond.. (Pandas,dplyr,data.table)
Updated May 2017