Skip to content

Materials for Principles of Data Science BIOS 611

Notifications You must be signed in to change notification settings

akantuncch/datasci611

 
 

Repository files navigation

Welcome to UNC BIOS 611

Introduction to Data Science

This repository contains course materials for BIOS 611 (Introduction to Data Science) typically taught during the Fall Semester at UNC Chapel Hill in the Department of Biostatistics.

The intent of the course is to provide an intensive introduction to the technical material and skills that a data scientist needs in order to do repeatable, reliable research.

It covers basic linux tools like bash and make, Docker, git (extensively) and serves as an introduction to R and Python including how one goes about organizing a research project and an R or Python library.

Along the way we will become informally familiar with some analytical techniques: classification, regression and clustering. The emphasis here is practical: how to use the methods while avoiding common pitfalls.

Course Syllabus and Schedule

Class is at 3:35 pm - 4:50 pm on MW. There is a lab session from 2:00 pm to 3:00 pm on Tuesdays.

Class is held in: McGavran-Greenberg PH-Rm 2308 Lab is held in: McGavran-Greenberg PH-Rm 2308

Date Course Title Material Homework
Mon 08/18/20201 Introduction 1,2 hw1 due: Wed 08/25/2021
Mon 08/23/2021 Compute Resources 1,2,3 hw2 due: Mon 08/30/2021
Wed 08/25/2021 Unix 1,2,3 hw3 due: Wed 09/08/2021
Mon 08/30/2021 Docker 1,2,3,4 hw4 due: Wed 09/15/2021
Wed 09/01/2021 git basics & github basics 1,2,3,4 hw5 due: Mon 09/20/2021
Mon 09/06/2020 Labor Day 🍞🌹 1,2
Wed 09/08/2021 How to Think about Programming & R 1,2 hw6 due: Wed 09/27/2021
Mon 09/13/2021 More R 1,2
Wed 09/15/2021 Tidyverse for Tidying & GGPlot 1,2,3,4,56
Mon 09/20/2021 Make and Makefiles 12
Wed 09/22/2021 git concepts and practices 123
Mon 09/27/2021 Project Organization 123
Wed 09/29/2021 ~~~~
Mon 10/04/2021 Dimensionality Reduction 1234 hw7 due: Mon 10/11/2021
Wed 10/06/2021 Clustering 1234 hw8 due: Wed 10/13/2021
Mon 10/11/2021 Classification 1234567 hw9 due: Mon 10/18/2021
Wed 10/13/2021 Model Validation and Selection 12
Mon 10/18/2021 Shiny 123456 hw10 due: Mon 10/25/2021
Wed 10/20/2021 Introduction to Scientific Python 12 hw11 due: Wed 10/27/2021
Mon 10/25/2021 SQL (and pandas, dplyr) 123
Wed 10/27/2021 Pandas & SQL 1[2] hw12 due: Wed 11/03/2021
Fri 10/29/2021 Mid Term Project Review
Mon 11/01/2021 SKLearn Introduction
Wed 11/03/2021 Training Neural Networks
Mon 11/08/2021 Bokeh
Wed 11/10/2021 Browser Based Visualization w/ d3
Mon 11/15/2021 Data Science Ethics
Wed 11/17/2021 Panel Discussion
Mon 11/22/2021 Thanksgiving 🦃
Wed 11/24/2021 Feedback Day
Mon 11/29/2021 Class Presentations I
Wed 12/01/2021 Class Presentations II ---
---

There is also a lab held every Tuesday. This will be generally unstructured time where you will be able to work on projects and ask me questions. Sometimes we will use this time to cover material.

Working With This Stuff

I provide a Docker container which you can use to hack on these lectures and the associated materials. Some lectures may have their own Docker container. But to work on most of them:

./start-env.sh

This will start an RStudio Instance.


About

Materials for Principles of Data Science BIOS 611

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 46.8%
  • Jupyter Notebook 33.1%
  • TeX 9.6%
  • R 6.0%
  • Python 2.2%
  • Shell 0.8%
  • Other 1.5%