Instructor Names: Brian Wright (brianwright@virginia.edu) & Jonathan Kropko (jkropko@virginia.edu)
Term: Fall 2019
Location: Gilmer 141
Time: MW 3:30-4:45
Office hours: Jonathan M 10-12 \ Brian W 10-12 Dell 1 103B
Teaching Assistant: Shashwat Kumar (sk9epp@virginia.edu)
Collab Site: This repo and the collab site will both have all the course content. Collab will also be used to submit assignments.
The purpose of this course is to provide students with an understanding of data science in practice with a focus on empirical decision making using a contemporary data science lifecycle approach. In doing so, the course is separated into three high-level sections covering two semesters. The first semester focuses on understanding the field of data science and skills needed to have a successful career as a data scientist. This is followed by content centered on creating and using pipelines for data acquisition, wrangling, and visualization. The second semester is designed to preparing students to communicate data-driven outcomes through presentations, research papers and working through case studies to better understand typical challenges faced in the field.
- Understanding of the data science lifecycle
- How to plan and execute data driven projects as a team
- How to gather and clean data using Python
- The creation and utilization of data science pipelines
- Creation of simple web-based interactive visualizations
This is a tentative schedule and is subject to change. Please check here regularly for updates.
Week # | Date | Topics | Readings/References | Assignments | Due | Prof |
---|---|---|---|---|---|---|
1 | 08/28 | Syllabus review/Capstone Q&A/Lifecycle | AoDS 1-3 chpts | B/J | ||
2 | 09/02 | Teamwork/Charter | Analyzing the Analyzers Survey | Charter/Dream Job | C: 9.9 DJ: 9.4 | B/J |
2 | 09/04 | Problem Solving/Jenn Huck/Bill Schoelwer | Team Coding | TC: 9.11 | B/J | |
3 | 09/09 | Landscape/Tech Presentations/Proposal | Proposal | 9.25 | B/J | |
3 | 09/11 | Project Mng/Trello | Kaban Trello Example | Project Plan | 9.16 | B/J |
4 | 09/16 | Client Management/Target Setting/GitHub | Software Carp | Software Carp | 9.18 | B/J |
4 | 09/18 | Life Long Learning/Getting Help/Documentation | J/B | |||
5 | 09/23 | Data Acquistion:CSV/ASCII Delimiters, Headers | J/B | |||
5 | 09/25 | Data Acquistion:CSV/ASCII JSON/APIs | Lab | NC | J/B | |
6 | 09/30 | Student Presentations | B/J | |||
6 | 10/02 | Student Presentations | B/J | |||
7 | 10/07 | Reading Days(Fall Break) | ||||
7 | 10/09 | Data Acquistion: Web Scraping | Lab | J/B | ||
8 | 10/14 | Data Loading DB/SQL | R | |||
8 | 10/16 | Data Loading DB/SQL | R | |||
9 | 10/21 | Data Loading DB/SQL | R | |||
9 | 10/23 | Data Loading DB/SQL | R | |||
10 | 10/28 | API/Beautiful Soup | J/B | |||
10 | 10/30 | Data Cleaning: Pandas | J/B | |||
11 | 11/04 | Data Cleaning: Pandas | J/B | |||
11 | 11/06 | Data Cleaning: Pandas | J/B | |||
12 | 11/11 | Data Cleaning: | J/B | |||
12 | 11/13 | Data Cleaning: | J/B | |||
13 | 11/18 | Data Viz | B/J | |||
13 | 11/20 | Data Viz | B/J | |||
14 | 11/25 | Data Viz/Dash | B/J | |||
14 | 11/27 | No Class Thanksgiving Break | ||||
15 | 12/02 | Data Viz/Dash | J/B | |||
15 | 12/04 | Rshiny | B/J | |||
16 | 12/16 | Data Pipeline Presentations | B/J |
- The Art of Data Science = AoDS
- Python Data Science Handbook = PDSH
- SQLite Python Tutorial = SPT - cost involved
Course Slack Channel: Invite link: Slack Channel Invite
We will leverage Slack in this course to:
- Familiarize students with a platform used by data science teams in industry
- Increase access to the professors and TA
- Quickly disseminate files
- Foster a collaborative environment for students to work together
Online resource
- Low cost platforms to get more learning as needed: Data Quest, Data Camp, Coursera
UVA Library Resources
- UVA Library Research Data Services (R, Python, Scientific Workflow Tools, etc.)
- UVA Health Science Library Research Data Workshops (R, image processing, etc.)
- Advanced Research Computing Services (ARCS) Hosts workshops and sessions related to high-performance computing (aka Rivanna)
- Scholars’ Lab at UVA Library focuses on digital humanities, spatial technologies, and cultural heritage (e.g., GIS, Twitter, 3D Printing, Photogrammetry)
- StatLab is UVA Library’s statistical consulting service.
Social Media Follows: Adjusting Your Information Algo
- Medium: Programming, Data Science, Data Engineering
- Reddit: Data Science and Data Engineering
- Quora: Ask a Data Scientist, The Art of Data Science, Code, Become a Great Programmer
- Blogs/Message Board: http://blog.kaggle.com/, https://news.ycombinator.com/news, https://www.kdnuggets.com/
- Twitter: Kirk Borne, Andrew Ng, Gregory Piatetsky, Fei-Fei Li, Hadley Wickham, Dj Patil, Sam Harris
Assignment | % |
---|---|
Quizzes | 15% |
Labs/In-Class Excercises | 15% |
Charter/Proposal (Presentation) | 35% |
Data Pipeline Presentation | 35% |
There will be two presentations designed to demonstrate your understanding of the practices being discussed as applied to your capstone projects.
These assessments will be used to check learning and give feedback on areas for improvement. Reading prior to class, class attendance, and participation in activities are essential for success on this part of the course.
Details on requirements will be given during class periods. Most assignments will be due the next class (NC) period and can be submitted via colab. We will work to provide feedback in the next class session.
There will be at least two quizzes in either take home or in-class form.
- 93-100 A
- 90-92 A-
- 87-89 B+
- 83-86 B
- 80-82 B-
- 77-79 C+
- 73-76 C
- 70-72 C-
- <70 F
The course meets twice a week for roughly 1 hour and half. It's likely for every hour of class time two hours of work outside of the classroom at minimum will be necessary for successful completion of the course requirements. This bringd the total curricular related hours up to roughly 9 hours per week.
In accordance with University policy if you need to be adsent due to a religious holiday just let us know and we will make arrangements for you.