Skip to content

Materials for Principles of Data Science BIOS 611

Notifications You must be signed in to change notification settings

Vincent-Toups/datasci611

 
 

Repository files navigation

Welcome to UNC BIOS 611

Introduction to Data Science

This repository contains course materials for BIOS 611 (Introduction to Data Science) typically taught during the Fall Semester at UNC Chapel Hill in the Department of Biostatistics.

The intent of the course is to provide an intensive introduction to the technical material and skills that a data scientist needs in order to do repeatable, reliable research.

It covers basic linux tools like bash and make, Docker, git (extensively) and serves as an introduction to R and Python including how one goes about organizing a research project and an R or Python library.

Along the way we will become informally familiar with some analytical techniques: classification, regression and clustering. The emphasis here is practical: how to use the methods while avoiding common pitfalls.

Course Syllabus and Schedule

Class is at 3:35 pm - 4:50 pm on MW. There is a lab session from 2:00 pm to 3:00 pm on Tuesdays.

Class is held in: McGavran-Greenberg PH-Rm 2308 Lab is held in: McGavran-Greenberg PH-Rm 2306

Date Topic Time Reading Materials
2023-08-21 Introduction 3:35-4:50 pm m1, m2
2023-08-22 Lab 2-3 pm
2023-08-23 Compute Resources 3:35-4:50 pm m3
2023-08-28 Unix 3:35-4:50 pm m6, m7, m8
2023-08-29 Lab 2-3 pm
2023-08-30 Docker 3:35-4:50 pm m4, m5, m9, m10
2023-09-04 Labor Day None
2023-09-05 Well Being Day None
2023-09-06 git basics & github basics 3:35-4:50 pm m13, m14, m15, m17
2023-09-11 How to Think about Programming & More R 3:35-4:50 pm m18, m19, m20
2023-09-12 Lab 2-3 pm
2023-09-13 Tidyverse for Tidying & GGPlot 3:35-4:50 pm m21, m23, m24, m25, m26, m27
2023-09-18 Make and Makefiles 3:35-4:50 pm m29, m30
2023-09-19 Lab 2-3 pm
2023-09-20 git concepts and practices 3:35-4:50 pm m33
2023-09-25 Well Being Day None
2023-09-26 Lab 2-3 pm
2023-09-27 Markdown, RMarkdown, Notebooks, L 3:35-4:50 pm m35, m36
2023-10-02 Project Organization 3:35-4:50 pm
2023-10-03 Lab 2-3 pm
2023-10-04 Dimensionality Reduction 3:35-4:50 pm m37, m38, m39, m40
2023-10-09 Clustering 3:35-4:50 pm m41, m43, m44
2023-10-10 Lab 2-3 pm
2023-10-11 Classification 3:35-4:50 pm m45, m46
2023-10-16 Model Validation and Selection 3:35-4:50 pm m48, m49, m50, m51, m52
2023-10-17 Lab 2-3 pm
2023-10-18 Shiny 3:35-4:50 pm m58
2023-10-23 Introduction to Scientific Python 3:35-4:50 pm m60, m61
2023-10-24 Lab 2-3 pm
2023-10-25 SQL (and pandas, dplyr) 3:35-4:50 pm m62, m63, m64
2023-10-30 Pandas & SQL 3:35-4:50 pm m64
2023-10-31 Lab 2-3 pm
2023-11-01 SKLearn Introduction 3:35-4:50 pm
2023-11-06 Training Neural Networks 3:35-4:50 pm
2023-11-07 Lab 2-3 pm
2023-11-08 Bokeh 3:35-4:50 pm
2023-11-13 Browser Based Visualization w/ d3 3:35-4:50 pm m65, m66
2023-11-14 Lab 2-3 pm
2023-11-15 Data Science Ethics 3:35-4:50 pm m67, m68
2023-11-20 Panel Discussion 3:35-4:50 pm
2023-11-21 Lab 2-3 pm
2023-11-22 Thanksgiving None
2023-11-27 Web Scraping 3:35-4:50 pm m69
2023-11-28 Lab 2-3 pm
2023-11-29 Feedback Day 3:35-4:50 pm
2023-12-04 Class Presentations I 3:35-4:50 pm
2023-12-05 Lab 2-3 pm
2023-12-06 Class Presentations II 3:35-4:50 pm

Lab will be generally unstructured time where you will be able to work on projects and ask me questions. Sometimes we will use this time to cover material.

Working With This Stuff

I provide a Docker container which you can use to hack on these lectures and the associated materials. Some lectures may have their own Docker container. But to work on most of them:

./start-env.sh

This will start an RStudio Instance.


About

Materials for Principles of Data Science BIOS 611

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.9%
  • HTML 19.8%
  • TeX 0.7%
  • Python 0.3%
  • R 0.2%
  • Shell 0.1%