This repository contains notes for the Intro to ML Safety course.
Currently, the notes are not yet complete. We are looking for volunteers who will help us finish them. Ideally, notes will present the information from lectures and readings in a different way, so that students can have multiple angles of looking at the same material. Notes shouldn't just be notes from the lectures and ideally will include citations to papers.
If you would like to contribute to the course notes, feel free to make a pull request! We will credit you here and in the course notes.
Some prelimary notes on some of the topics already exist, but they aren't complete.
Lecture | Status | Contributor(s) |
---|---|---|
Introduction | Not started | |
Deep Learning Review | Ready for Review | Nathaniel Li |
Risk Decomposition | Ready for Review | Cody Rushing |
Accident Models | Not started | |
Black Swans | Not started | |
Adversarial Robustness | Needs revision | Oliver Zhang |
Black Swan Robustness | Needs revision | Oliver Zhang |
Anomaly Detection | Needs revision | Oliver Zhang |
Interpretable Uncertainty | Needs revision | Oliver Zhang |
Transparency | Ready for Review | Cody Rushing |
Trojans | Ready for Review | Ethan Gutierrez |
Detecting Emergent Behaviour | Ready for Review | Bilal Chughtai |
Honest Models | Not started | |
Intrasystem Goals or Power Aversion | Not started | |
Machine Ethics | Not started | |
ML for Improved Decision-Making | Ready for Review | Nathaniel Li |
ML for Cyberdefense | Not started | |
Cooperative AI | Ready for Review | Bilal Chughtai |
X-Risk | Not started | |
Possible Existential Hazards | Not started | |
Safety-Capabilities Balance | Not started | |
Review and Conclusion | Not started |