OpenAI Scholars: Reinforcement Learning Self-Study

Week 1: Markov Decision Processes

Resources

Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Jan 1 2018 Draft Chapter 3: Markov Decision Processes and Chapter 4: Dynamic Programming
Deep RL Bootcamp Core Lecture 1 Intro to MDPs and Exact Solution Methods -- Pieter Abbeel Video | Slides
Deep RL Bootcamp Core Lecture 2 Sample-based Approximations and Fitted Learning -- Rocky Duan Video | Slides
Deep RL Bootcamp Lab 1: Markov Decision Processes You will implement value iteration, policy iteration, and tabular Q-learning and apply these algorithms to simple environments including tabular maze navigation (FrozenLake) and controlling a simple crawler robot.
CS294 Reinforcement learning introduction -- Levine Video | Slides
CS294 Value functions introduction -- Levine Video | Slides
Introduction to Reinforcement Learning by Joshua Achiam, OpenAI Slides

Notes

RL Algorithms Diagram

Interaction in Markov decision process

Value Iteration in an MDP

OpenAI’s Crawler robot attempting to walk with random actions. video

The same Crawler robot after it has been trained for 30,000 steps with a Q-learning algorithm. video