This is the handout and homework repository for the course "ME 297 Introduction to Data Science for Mechanical Engineers" which is currently taught (Fall 2020) by Prof. Ilias Bilionis at Purdue University. The course is fully online with the videos being accessible through Brightspace. This is an experimental course that will likely evolve into a requirement for all Purdue Mechanical Engineering students.
The material is published under the GNU General Public License. You can reuse it in your own courses as soon as you also include the same license and cite this repository. Please send me an email if you do as I would love to know!
Below, I provide links that open up directly on Google Colab. If you want to view the Jupyter notebooks locally, please see the section named Running the notebooks on your personal computer.
-
Lecture 1, Part A - Introduction to Predictive Modeling
-
Lecture 1, Part B - Python Basics
-
Lecture 2 - Advanced Python Concepts
-
Lecture 3 - Basics of Probability Theory
-
Lecture 4 - Discrete Random Variables
-
Lecture 5 - Continuous Random variables
-
Lecture 6 - Collections of Random variables
-
Lecture 7 - The Monte Carlo Method for Estimating Expectations
- Hands-on Activity 7.1 (The law of large numbers)
- Hands-on Activity 7.2 (Estimating the variance)
- Hands-on Activity 7.3 (Estimating the cumulative distribution function)
- Hands-on Activity 7.4 (Estimating the probability density function via histograms)
- Hands-on Activity 7.5 (Estimating predictive quantiles)
- Hands-on Activity 7.6 (Application – Uncertainty propagation through an initial value problem)
-
Lecture 8 - Analytical Examples of Bayesian Inference
-
Lecture 9 - Linear Regression via Least Squares
-
Lecture 10 - Bayesian Linear regression
- Reading Activity 10
- Hands-on Activity 10.1 (Probabilistic interpretation of least squares – Estimating the measurement noise)
- Hands-on Activity 10.2 (Maximum a posteriori estimate – Avoiding overfitting)
- Hands-on Activity 10.3 (Bayesian linear regression)
- Hands-on Activity 10.4 (The point-predictive distribution – Separating epistemic and aleatory uncertainty)
-
Lecture 11 - Classification
- Reading Activity 11
- Hands-on Activity 11.1 (Logistic regression with a single variable)
- Hands-on Activity 11.2 (Logistic regression with many features)
- Hands-on Activity 11.3 (Making decisions)
- Hands-on Activity 11.4 (Diagnostics for classification)
- Hands-on Activity 11.5 (Multi-class logistic regression)
-
Lecture 12 - Clustering and Density Estimation
-
Lecture 13 - Dimensionality Reduction
-
Lecture 14 - Deep Neural Networks
-
Lecture 15 - Deep Neural Networks Continued
Make sure you have a Google account before you start. Then, you just click on the links above.
One solution is to "print" your notebook to a PDF. However, we have observed that sometimes the figures get a bit messed up. One solution is to run the notebooks on your own laptop, and the do "File-> Download as-> PDF via Latex (.pdf)." See below if you want to take that route. Now, it is possible to do the same thing on Google Colab. Follow the instructions in this notebook.
Find and download the right version of Anaconda for Python 3.7 from Continuum Analytics. This package contains most of the software we are going to need. Note: You do need Python 3 and note Python 2. The notebooks will not work with Python 2.
- We need C, C++, Fortran compilers, as well as the Python sources. Start the command line by opening "Anaconda Prompt" from the start menu. In the command line type:
conda config --append channels https://repo.continuum.io/pkgs/free
conda install mingw libpython
- Finally, you need git. As you install it, make sure to indicate that you want to use "Git from the command line and also from 3rd party software".
- Download and install the latest version of Xcode.
If you are using Linux, I am sure that you can figure it out on your own.
Independently of the operating system, use the command line to install the following Python packages:
- Seaborn, for beautiful graphics:
conda install seaborn
- PyMC3 for MCMC sampling:
conda install pymc3
- GPy for Gaussian process regression:
pip install GPy
- pydoe for generating experimental designs:
pip install pydoe
- fipy for solving partial differential equations using the finite volume method:
pip install fipy
*** Windows Users ***
You may receive the error
ModuleNotFoundError: No module named 'future'
If so, please install future and then install fipy:
pip install future
- scikit-learn for some standard machine learning algorithms implemented in Python:
conda install scikit-learn
- graphviz for visualizing probabilistic graphical models:
pip install graphviz
- Open the command line.
cd
to your favorite folder.- Then, type:
git clone https://github.com/PurdueMechanicalEngineering/me-297-intro-to-data-science.git
- This will download the contents of this repository in a folder called
me-297-intro-to-data-science
. - Enter the
me-297-intro-to-data-science
folder:
cd me-297-intro-to-data-science
- Start the jupyter notebook by typing the command:
jupyter notebook
- Use the browser to navigate the course, experiment with code etc.
- If the course content has been updated, type the following command (while being inside
me-297-intro-to-data-science
) to get the latest version:
git pull origin master
Keep in mind, that if you have made local changes to the repository, you may have to commit them before moving on.