Feedback form: https://goo.gl/forms/jV5abBRxYcIVigFm1
This workshop is build and adapted for astronomy from the introductory and intermediate ML workshops prepared by the Curtin Institute for Computation (CIC).
The workshop material was prepared by:
from the CIC, and
from the International Centre fo Radio Astronomy Research.
Material in the notebooks have, in part, been referenced and adapted from:
- Randal Olsen's Data Science Notebook
- Sebastian Raschka's Python Machine Learning Notebooks
- Kevin Markham's Scikit Learn Notebooks
- Deep Learning with Python book.
We'll be using a gitter chat room to ask questions and chat: https://gitter.im/ANITA-astroinformatics-school/2019-school
Notebooks will be updated with the answers and pushed to the repo at the end of each day.
The saved models and histories for Part VII can be downloaded from this google folder. Please save them into your notebooks/data folder.
pandas: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Scikit-learn: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf
Keras: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf
A working knowledge of Python and Jupyter notebooks is essential for this workshop. i.e. knowledge of basic data structures, operations and how to write scripts. The Python notebooks used throughout the workshop have been developed using Python 3.6.3.
Required packages:
- NumPy: a fast numerical array structure and helper functions
- pandas: a DataFrame structure to store data in memory and work with it easily and efficiently
- matplotlib: a basic plotting library; most other plotting libraries are built on top of it
- seaborn: a advanced statistical plotting library
- scikit-learn: a machine learning package, more info here
- Keras: a high-level API for implementing neural networks, more info here
Please make sure you have everything installed and ready to go for the workshop!
The datasets used in this workshop are either part of the machine learning packages or were compiled from the Galaxy Zoo DR1, the Sloan Digital Sky Survey (SDSS) (using the DR9 SQL search), and the Galaxy And Mass Assembly (GAMA) survey. The N-body simulation was produced by Jonathan Diaz.
The notebooks are descriptive and comprehensive enough to be attempted at your own pace - a solution notebook is also provided. The lecture notes explain the intuition behind how different machine learning algorithms work.
- Data preparation and exploratory data analysis with pandas
- Classification
- Cross validation
- Learning curves
- Model tuning
- Reporting
- Regression
- Clustering and dimensionality reduction
- Introduction to deep learning
- building a network from scratch (Artificial Neural Network example)
- training and evaluation of the model
- Convolutional Neural Networks
- Transfer learning
- Hands on examples and exercises throughout
- Rigorous mathematical working and proofs.
There simply isn't enough time in a workshop format.
We will provide links and material that you can and should read.
This work is made available under the Creative Commons Attribution 4.0 International License.