Skip to content

Latest commit

 

History

History
executable file
·
872 lines (475 loc) · 36 KB

index.md

File metadata and controls

executable file
·
872 lines (475 loc) · 36 KB
layout
default

Machine Learning Systems (Fall 2019)

  • When: Mondays and Fridays from 2:00 to 3:30
  • Where: Soda 310
  • Instructor: Joseph E. Gonzalez
    • Office Hours: Wednesdays from 4:00 to 5:00 in 773 Soda Hall.
  • Announcements: Piazza
  • Sign-up to Present: Google Spreadsheet Every student should sign-up to present in at least three rows and as different roles each time. Note that the Backup/Scribe presenter may be asked to fill in for one of the other roles with little notice.
  • If you have reading suggestions please send a pull request to this course website on Github by modifying the index.md file.

Course Description

The recent success of AI has been in large part due in part to advances in hardware and software systems. These systems have enabled training increasingly complex models on ever larger datasets. In the process, these systems have also simplified model development, enabling the rapid growth in the machine learning community. These new hardware and software systems include a new generation of GPUs and hardware accelerators (e.g., TPU and Nervana), open source frameworks such as Theano, TensorFlow, PyTorch, MXNet, Apache Spark, Clipper, Horovod, and Ray, and a myriad of systems deployed internally at companies just to name a few. At the same time, we are witnessing a flurry of ML/RL applications to improve hardware and system designs, job scheduling, program synthesis, and circuit layouts.

In this course, we will describe the latest trends in systems designs to better support the next generation of AI applications, and applications of AI to optimize the architecture and the performance of systems. The format of this course will be a mix of lectures, seminar-style discussions, and student presentations. Students will be responsible for paper readings, and completing a hands-on project. For projects, we will strongly encourage teams that contains both AI and systems students.

New Course Format

A previous version of this course was offered in Spring 2019. The format of this second offering is slightly different. Each week will cover a different research area in AI-Systems. The Monday lecture will be presented by Professor Gonzalez and will cover the context of the topic as well as a high-level overview of the reading for the week. The Friday lecture will be organized around a mini program committee meeting for the weeks readings. Students will be required to submit detailed reviews for a subset of the papers and lead the paper review discussions. The goal of this new format is to both build a mastery of the material and also to develop a deeper understanding of how to evaluate and review research and hopefully provide insight into how to write better papers.

Course Syllabus

{% capture dates %} 8/30/19 9/2/19 9/6/19 9/9/19 9/13/19 9/16/19 9/20/19 9/23/19 9/27/19 9/30/19 10/4/19 10/7/19 10/11/19 10/14/19 10/18/19 10/21/19 10/25/19 10/28/19 11/1/19 11/4/19 11/8/19 11/11/19 11/15/19 11/18/19 11/22/19 11/25/19 11/29/19 12/2/19 12/6/19 12/9/19 12/13/19 12/16/19 12/20/19 {% endcapture %} {% assign dates = dates | split: " " %}

This is a tentative schedule. Specific readings are subject to change as new material is published.

Jump to Today

{% include syllabus_entry %}

Introduction and Course Overview

This lecture will be an overview of the class, requirements, and an introduction to the history of machine learning and systems research.

{% include syllabus_entry %}

Holiday (Labor Day)

There will be no class but please sign-up for the weekly discussion slots.

{% include syllabus_entry %}

Big Ideas and How to Evaluate ML Systems Research

Additional Machine Learning Reading

Additional Systems Reading

Open Debate about the Field

{% include syllabus_entry %}

Machine Learning Life-cycle

This lecture will discuss the machine learning life-cycle, spanning model development, training, and serving. It will outline some of the technical machine learning and systems challenges at each stage and how these challenges interact.

{% include syllabus_entry %}

Discussion of Papers on Machine Learning Life-cycle

* [Data Engineering Bulletin: Machine Learning Life-cycle Management](http://sites.computer.org/debull/A18dec/issue1.htm) * [Context: The Missing Piece in the Machine Learning Lifecycle](https://rlnsanz.github.io/dat/Flor_CMI_18_CameraReady.pdf) * [Software 2.0 Blog Post](https://medium.com/@karpathy/software-2-0-a64152b37c35) * [Doing Machine Learning the Uber Way: Five Lessons From the First Three Years of Michelangelo](https://towardsdatascience.com/doing-machine-learning-the-uber-way-five-lessons-from-the-first-three-years-of-michelangelo-da584a857cc2) * [Introducing FBLearner Flow: Facebook’s AI backbone](https://engineering.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/) * [DeepBird: Twitters ML Deployment Framework](https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow.html) * [Demonstration of Mlflow: A System to Accelerate the Machine Learning Lifecycle](https://www.sysml.cc/doc/2019/demo_33.pdf)

Software:

{% include syllabus_entry %}

Database Systems and Machine Learning

In the previous lecture we saw that data and feature engineering is often the dominant hurtle in model development. Database systems are often the source of data and the platform in which feature engineering takes place. This lecture will cover some of the big ideas is database systems and how they relate to work on machine learning in databases.

  • Lecture slides: [pdf, pptx]
  • Project Proposal Sign-up doc. You must be enrolled in the class or on the waitlist to access this document. Please add any projects you are thinking about starting and list yourself as interested in anyone else's projects.

{% include syllabus_entry %}

Discussion of Database Systems and Machine Learning

{% include syllabus_entry %}

Machine Learning Frameworks and Automatic Differentiation

This week we will discuss recent development in model development and training frameworks. While there is a long history of machine learning frameworks we will focus on frameworks for deep learning and automatic differentiation. In class we will review some of the big trends in machine learning framework design and basic ideas in forward and backward automatic differentiation.

Project proposals are due next Monday

{% include syllabus_entry %}

Machine Learning Frameworks and Automatic Differentiation

Update: Two of the readings were changed to reflect a focus on deep learning frameworks. The previous readings on SystemML and KeystoneML have been moved to optional reading.

Pipeline Training Frameworks (Classical)

Automatic Differentiation and Differentiable Programming

Deep Learning Frameworks with Automatic Differentiation

Deep Learning Primitives

{% include syllabus_entry %}

Distributed Model Training

This week we will discuss developments in distributed training. We will quickly review the statistical query model pushed by early map-reduce machine learning frameworks and then discuss advances in parameter servers and distributed neural network training.

Project Proposals Due!

  • One Page Project description due at 11:59 PM. Check out the suggested projects. Submit a link to your one page Google document containing your project descriptions to this google form. You only need one submission per team but please list all the team member's email addresses. You can also update your submission if needed.

{% include syllabus_entry %}

Discussion on Distributed Model Training

ImageNet in X Minutes

All-Reduce

{% include syllabus_entry %}

Prediction Serving

Until recently, much of the focus on systems research was aimed at model training. However, recently there has been a growing interest in addressing the challenges of prediction serving. This lecture will frame the challenges of prediction serving and cover some of the recent advances.

{% include syllabus_entry %}

Power Outage Related Holiday

Unfortunately, class was canceled and so the PC Meeting has been moved to Monday. Note that early project presentations are also due next Friday.

{% include syllabus_entry %}

Discussion on Prediction Serving

The Prediction-Serving Systems: What happens when we wish to actually deploy a machine learning model to production? ACM Queue article provides a nice overview.

Systems Reading:

More Efficient Models:

Performance Breakdown of various models

{% include syllabus_entry %}

Project Presentations

{% include syllabus_entry %}

Finish Project Presentations and Start Model Compilation

This week we will explore the process of compiling/optimizing deep neural network computation graphs. This reading will span both graph level optimization as well as the compilation and optimization of individual tensor operations.

{% include syllabus_entry %}

Discussion of Model Compilation

{% include syllabus_entry %}

PG&E and Fire Related Cancellation

Unfortunately, due to the power outage, lecture is canceled today. To make up for lost lecture(s) and accommodate our guest speakers, we will skip the overview lecture this week and start with the PC meeting on Machine Learning Applied to Systems. However, this will put a little extra pressure on the neutral presenters to provide additional context. We will then cover the discussion on machine learning hardware the following Monday.

{% include syllabus_entry %}

Discussion of Machine Learning Applied to Systems Day 1

{% include syllabus_entry %}

Hardware Acceleration for Machine Learning

This lecture will be presented by Kurt Keutzer and Suresh Krishna who are experts in processor design as well as network and architecture co-design.

  • Guest lecture slides: [pdf, pptx]

{% include syllabus_entry %}

Discussion Hardware Acceleration for Machine Learning

{% include syllabus_entry %}

(11/11) Administrative Holiday

{% include syllabus_entry %}

Discussion of Machine Learning Applied to Systems Day 2

{% include syllabus_entry %}

Learning with Adversaries

This week we will discuss machine learning in adversarial settings. This includes secure federated learning, differential privacy, and adversarial examples.

{% include syllabus_entry %}

Discussion on Learning with Adversaries

{% include syllabus_entry %}

Autonomous Driving

Autonomous vehicles will likely transform society in the next decade and are fundamentally AI enabled systems. In this lecture we will discuss the AI-Systems challenges around autonomous driving.

{% include syllabus_entry %}

(11/29) Holiday (Thanksgiving)

{% include syllabus_entry %}

Discussion on Autonomous Driving

Everyone must do one of the readings (you pick).

{% include syllabus_entry %}

Conclusion!

{% include syllabus_entry %}

(12/6) RRR Week

{% include syllabus_entry %}

(12/9) RRR Week

{% include syllabus_entry %}

(12/16) Poster Presentations

{% include syllabus_entry %}

(12/20) No Class

Don't forget to submit your final reports. As noted on Piazza, the final report should be 6-pages plus references (2-column, 10pt font, unlimited appendix). Please submit your report using this form:

You only need to submit the project once per team. The write-up should discuss the problem formulation, related work, your approach, and your results.

Week Date (Lec.) Topic

Projects

Detailed candidate project descriptions will be posted shortly. However, students are encourage to find projects that relate to their ongoing research.

Grading

Grades will be largely based on class participation and projects. In addition, we will require weekly paper summaries submitted before class.

  • Projects: 60%
  • Weekly Summaries: 20%
  • Class Participation: 20%
<script type="text/javascript"> var current_date = new Date(); var rows = document.getElementsByTagName("th"); var finished = false; for (var i = 1; i < rows.length && !finished; i++) { var r = rows[i]; if (r.id.startsWith("counter_")) { var fields = r.id.split("_") var week_div_id = "week_" + fields[2] var lecture_date = new Date(fields[1] + " 23:59:00") if (current_date <= lecture_date) { finished = true; r.style.background = "orange" r.style.color = "black" var week_td = document.getElementById(week_div_id) week_td.style.background = "#043361" week_td.style.color = "white" var anchor = document.createElement("div") anchor.setAttribute("id", "today") week_td.prepend(anchor) } } } $(".reading").each(function(ind, elem) { var optional_reading = $(elem).find(".optional_reading"); if(optional_reading.length == 1) { optional_reading = optional_reading[0]; optional_reading.setAttribute("id", "optional_reading_" + ind); var button = document.createElement("button"); button.setAttribute("class", "btn btn-primary btn-sm"); button.setAttribute("type", "button"); button.setAttribute("data-toggle", "collapse"); button.setAttribute("data-target", "#optional_reading_" + ind); button.setAttribute("aria-expanded", "false"); button.setAttribute("aria-controls", "#optional_reading_" + ind); optional_reading.setAttribute("class", "optional_reading_no_heading collapse") button.innerHTML = "Additional Optional Reading"; optional_reading.before(button) } }) $(".details").each(function(ind, elem) { elem.setAttribute("id", "details_" + ind); var button = document.createElement("button"); button.setAttribute("class", "btn btn-primary btn-sm"); button.setAttribute("type", "button"); button.setAttribute("data-toggle", "collapse"); button.setAttribute("data-target", "#details_" + ind); button.setAttribute("aria-expanded", "false"); button.setAttribute("aria-controls", "#details_" + ind); elem.setAttribute("class", "details_no_heading collapse") button.innerHTML = "Detailed Description"; elem.before(button) }) </script>