Last update: 2024-01-07.
This repo contains the materials for the course "MECE 4520: Data Science for Mechanical Systems", offered by the Department of Mechanical Engineering at Columbia University, during the Fall 2023 term. Link on Directory of Classes.
Past course evaluations (5-point scale): 4.6 (2023), 4.5 (2022), 4.2 (2021).
This course aims to give the students a general introduction to data science and machine learning, with hands-on exercises and applications in mechanical systems. The main topics to cover include supervised learning problems, such as linear regressions and classifications; unsupervised learning problems such as clustering; and reinforcement learning problems. At the end of the course, the students should be equipped with basic concepts of data science, and comfortable applying them to practical problems.
- Lectures: Monday and Wednesday, 8:40 AM - 9:55 AM.
- Location: 501 Northwest Corner Building.
- Office Hours: TBD.
- Lecturer: Changyao Chen (cc2759).
- TA: Shadia Sarmin (ss6703), Li Yuan (ly2596).
- Linear algebra.
- Knowledge of basic computer programming (e.g., Python, Matlab, R, Java).
The course will delivered as a series of lectures. The grading will be 60% homework and 40% final project. There will be in total 7 homework (HW) assignments, which are due throughout the course. The final project will be a group-based, 5-minute presentation of a selected topic.
Week | Subject | Optional Readings | Due that week |
---|---|---|---|
1 (half) | Introduction | DDSE 1.1, 1.2 | |
2 | Linear algebra. Statistic primer. | ISL 2.1 | HW #0 |
3 | Statistic primer. Linear regression. | ISL 3.1, 3.2 | |
4 | Linear regression. | DDSE 4.1, ISL 4.1 - 4.3 | HW #1 |
5 | Classification. Gradient descent. | ||
6 | Regularization. Feature selection. | HW #2 | |
7 | Dimension reduction. Final project workshop. | ISL 8.1, 8.2 | |
8 | Tree-based models. | HW #3 | |
9 | Neural Networks. | HW #4 | |
10 (half) | Unsupervised learning. | ISL 10.3 | Final project selection |
11 | Reinforcement learning. | HW #5 | |
12 (half) | Course summary. | ||
13 | Final project presentations, part I. | HW #6 | |
14 | Final project presentations, part II. |
* The homework is due at Tuesday 11:59 PM of the given week.
* DDSE is short for Data-Driven Science and Engineering
* ISL is short for An Introduction to Statistical Learning
In this course, we encourage the participants to get hands-on experience as much as possible. Therefore, we will prepare Jupyter Notebooks that correspond to each lecture's content, and recommend the students to make the most of them.
Introduction and linear algebra: General course structure. Introduction to Python (with lab session using Google Colab). Linear algebra review: vector, matrix properties and operations, eigenvalue and eigenvector, Single Value Decomposition.
Statistic primer: Probability review. Descriptive statistics. Central limit theorem. Point estimation and confidence interval. Hypothesis test concept, and two sample hypothesis test.
Linear regression: Simple linear regression. Residual analysis. Identification and handling of multi-collinearity. Multi-variable linear regression. Normal equation.
Classification: Logistic regression. Maximum likelihood estimation.
Gradient descent: Gradient descent: batch, stochastic, mini-batch.
Regularization. Feature selection. Dimension reduction: Overfitting, cross-validation, and bootstrap. Best subset, forward, backward selection. L1 (Lasso) and L2 (Ridge) regularization. Revisit of SVD. Principle Component Analysis.
Tree-based models: Single decision tree with recursive binary splitting approach. Bagging, Random Forest, and Boosting.
Neural Networks: Feed-forward Neural Networks (NN). Back-propagation. Introduction of Convolutional NN and Recurrent NN.
Unsupervised learning: Clustering methods (k-means, kd-tree, spectral clustering).
Reinforcement learning: Multi-arm bandit. Greedy, epsilon-greedy, and upper confidence bound policies.