After this course the student should be able to understand
- the relationship between machine learning and classical econometrics.
- the principles of machine learning methods including model selection based on cross validation.
- understand the main algorithms of machine learning such as Ridge, Lasso, Nearest neigbors and trees.
- Implement the core of the algorithms in Python (R, C++, etc) code. The aim is to understand how the algorithms work.
- Apply the algorithms on realistic practical problems with or without the help of statistical data packages such as scikit-learn.
The last couple of decades of statistical data analysis have been characterized by an increased emphasis on algorithms that can be described under the name of machine learning or big data. These methods have been successfully applied to the prediction of face recognition (Facebook), and predicting consumer preferences for movies (Netflix). They are also widely used in industry to decide whether just produced items match some quality criteria or not. Finallly, in medicine algorithms can help staff to classify tumors, and so on.
In this course we consider the principles of this algorithmic approach during in the lecture. We also look at practical examples and apply the algorithms in Python and compare our outcomes with existing software such as scikit-learn.
The topics that we deal with are the following:
- Introductory to machine learning for the econometrician.
- Principles of machine learning
- Nonparametric estimation methods
- Linear algorithms for prediction (Lasso, ridge)
- Algorithms for classification.
- Trees, bagging, random forests and boosting.
Here is a preliminary schedule. We rely on you to give us feedback on whether the pace of the lectures is right/too fast/too slow.
Week | day | Schedule | Section | Lecturer |
---|---|---|---|---|
1 | W | 1 | Ch 1 | avv |
F | 2 | 2.1-2.5 | avv | |
2 | W | 3 | 4.4 | avv |
F | 4 | coding | nvf | |
3 | W | 5 | 5.1-5.3, 5.5-5.7, 6.1-6.2 | avv |
F | 6 | coding | nvf | |
4 | F | 7 | 7.1, 7.2, 7.4-7.6, 7.8 | avv |
5 | W | 8 | Ch 8 | avv |
F | 9 | coding | nvf | |
6 | F | 10 | Ch 8 | avv |
F | 11 | coding | nvf | |
7 | W | 11 | coding | nvf |
F | 12 | coding/writing report | nvf | |
8 | W | 13 | codding/writing report | nvf |
F | 14 | No lecture/writing report | avv/nvf |
The section numbers refer to the sections in DSML.
See the report template directory on github.
At the end of the course we invite the groups to explain their work in an oral exam, and we determine a grade.
60 % report, 20 % oral exam, 20 % review.
Our primary book is Kroeze, Data Science and Machine Learning (DSML).
There are other, non-obligatory, sources that you might like to consult. I liked in particular the first for an intuitive, high-level, overview of the topics and problems.
- James, Witen, Hastie, Tibshirani: Introduction to Stastical Learning
- Hastie, Tibshirani, and Friedman:The Elements of Statistical Learning
- Hastie and Efron, Computer Age Statistical Inference: Algorithms, Evidence and Data Science
- Bishop, Pattern Recognition and Machine Learning
The university provides a few courses on python and R here.
Here are some topics that you might find interesting. They may serve as inspiration for your own paper.
- https://machinelearningmastery.com/neural-network-for-cancer-survival-dataset/
- https://www.sciencealert.com/mammal-extinctions-are-speeding-up-with-unprecedented-magnitude-scientists-warn
- https://www.pnas.org/content/114/37/9859
- https://www.theseattledataguy.com/healthcare-fraud-detection-with-python/
- https://towardsdatascience.com/learn-python-data-analytics-by-example-ny-parking-violations-e1ce1847fa2
- https://machinelearningmastery.com/random-forest-for-time-series-forecasting/
- https://machinelearningmastery.com/predicting-car-insurance-payout/
- https://medium.com/python-in-plain-english/building-a-smart-central-heating-system-with-a-raspberry-pi-and-python-403c6ea0fd7e
- https://www.genengnews.com/news/researchers-develop-biomarker-blood-test-for-depression-bipolar-disorder/
- Prof. dr. A.P. van Vuuren, a.p.van.vuuren@rug.nl
- dr. N.D. van Foreest, n.d.van.foreest@rug.nl, coordinator