This repository contains course materials for BIOS 611 (Introduction to Data Science) typically taught during the Fall Semester at UNC Chapel Hill in the Department of Biostatistics.
The intent of the course is to provide an intensive introduction to the technical material and skills that a data scientist needs in order to do repeatable, reliable research.
It covers basic linux tools like bash and make, Docker, git (extensively) and serves as an introduction to R and Python including how one goes about organizing a research project and an R or Python library.
Along the way we will become informally familiar with some analytical techniques: classification, regression and clustering. The emphasis here is practical: how to use the methods while avoiding common pitfalls.
Class is at 3:35 pm - 4:50 pm on MW. There is a lab session from 2:00 pm to 3:00 pm on Tuesdays.
Class is held in: McGavran-Greenberg PH-Rm 2308 Lab is held in: McGavran-Greenberg PH-Rm 2306
Date | Topic | Time | Reading Materials |
---|---|---|---|
2023-08-21 | Introduction | 3:35-4:50 pm | m1, m2 |
2023-08-22 | Lab | 2-3 pm | |
2023-08-23 | Compute Resources | 3:35-4:50 pm | m3 |
2023-08-28 | Unix | 3:35-4:50 pm | m6, m7, m8 |
2023-08-29 | Lab | 2-3 pm | |
2023-08-30 | Docker | 3:35-4:50 pm | m4, m5, m9, m10 |
2023-09-04 | Labor Day | None | |
2023-09-05 | Well Being Day | None | |
2023-09-06 | git basics & github basics | 3:35-4:50 pm | m13, m14, m15, m17 |
2023-09-11 | How to Think about Programming & More R | 3:35-4:50 pm | m18, m19, m20 |
2023-09-12 | Lab | 2-3 pm | |
2023-09-13 | Tidyverse for Tidying & GGPlot | 3:35-4:50 pm | m21, m23, m24, m25, m26, m27 |
2023-09-18 | Make and Makefiles | 3:35-4:50 pm | m29, m30 |
2023-09-19 | Lab | 2-3 pm | |
2023-09-20 | git concepts and practices | 3:35-4:50 pm | m33 |
2023-09-25 | Well Being Day | None | |
2023-09-26 | Lab | 2-3 pm | |
2023-09-27 | Markdown, RMarkdown, Notebooks, L | 3:35-4:50 pm | m35, m36 |
2023-10-02 | Project Organization | 3:35-4:50 pm | |
2023-10-03 | Lab | 2-3 pm | |
2023-10-04 | Dimensionality Reduction | 3:35-4:50 pm | m37, m38, m39, m40 |
2023-10-09 | Clustering | 3:35-4:50 pm | m41, m43, m44 |
2023-10-10 | Lab | 2-3 pm | |
2023-10-11 | Classification | 3:35-4:50 pm | m45, m46 |
2023-10-16 | Model Validation and Selection | 3:35-4:50 pm | m48, m49, m50, m51, m52 |
2023-10-17 | Lab | 2-3 pm | |
2023-10-18 | Shiny | 3:35-4:50 pm | m58 |
2023-10-23 | Introduction to Scientific Python | 3:35-4:50 pm | m60, m61 |
2023-10-24 | Lab | 2-3 pm | |
2023-10-25 | SQL (and pandas, dplyr) | 3:35-4:50 pm | m62, m63, m64 |
2023-10-30 | Pandas & SQL | 3:35-4:50 pm | m64 |
2023-10-31 | Lab | 2-3 pm | |
2023-11-01 | SKLearn Introduction | 3:35-4:50 pm | |
2023-11-06 | Training Neural Networks | 3:35-4:50 pm | |
2023-11-07 | Lab | 2-3 pm | |
2023-11-08 | Bokeh | 3:35-4:50 pm | |
2023-11-13 | Browser Based Visualization w/ d3 | 3:35-4:50 pm | m65, m66 |
2023-11-14 | Lab | 2-3 pm | |
2023-11-15 | Data Science Ethics | 3:35-4:50 pm | m67, m68 |
2023-11-20 | Panel Discussion | 3:35-4:50 pm | |
2023-11-21 | Lab | 2-3 pm | |
2023-11-22 | Thanksgiving | None | |
2023-11-27 | Web Scraping | 3:35-4:50 pm | m69 |
2023-11-28 | Lab | 2-3 pm | |
2023-11-29 | Feedback Day | 3:35-4:50 pm | |
2023-12-04 | Class Presentations I | 3:35-4:50 pm | |
2023-12-05 | Lab | 2-3 pm | |
2023-12-06 | Class Presentations II | 3:35-4:50 pm |
Lab will be generally unstructured time where you will be able to work on projects and ask me questions. Sometimes we will use this time to cover material.
I provide a Docker container which you can use to hack on these lectures and the associated materials. Some lectures may have their own Docker container. But to work on most of them:
./start-env.sh
This will start an RStudio Instance.