This repository contains the summaries of key papers in deep reinforcement learning, and the list is heavily based on key papers in OpenAI Spinning Up. The summaries aim to provide a high-level overview of each paper, listing out the key problems the authors tried to solve and the main contributions / algorithms proposed.
This list is well organized so that the former papers serve as prerequisites for the later ones. Readers are encouraged to read the original papers if they are interested in digging more into any specific details.
These summaries are co-authored by Cynthia Chen and Yawen Duan.
[1] Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013. Algorithm: DQN. [paper] [summary]
[2] Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning. [paper] [summary]
[3] Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. Algorithm: Dueling DQN. [paper] [summary]
[4] Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015. Algorithm: Double DQN. [paper] [summary]
[5] Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER). [paper] [summary]
[6] Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. Algorithm: Rainbow DQN. [paper] [summary]
[7] Asynchronous Methods for Deep Reinforcement Learning, Mnih et al, 2016. Algorithm: A3C. [paper] [summary]
[8] Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO. [paper] [summary]
[9] High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE. [paper] [summary]
[10] Proximal Policy Optimization Algorithms, Schulman et al, 2017. Algorithm: PPO-Clip, PPO-Penalty. [paper] [summary]
[11] Emergence of Locomotion Behaviours in Rich Environments, Heess et al, 2017. Algorithm: PPO-Penalty. [paper]
[12] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al, 2017. Algorithm: ACKTR. [paper] [summary]
[13] Sample Efficient Actor-Critic with Experience Replay, Wang et al, 2016. Algorithm: ACER. [paper] [summary]
[14] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018. Algorithm: SAC. [paper] [summary]
[15] Deterministic Policy Gradient Algorithms, Silver et al, 2014. Algorithm: DPG. [paper] [summary]
[16] Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015. Algorithm: DDPG. [paper] [summary]
[17] Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, 2018. Algorithm: TD3. [paper] [summary]
[18] A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017. Algorithm: C51. [paper]
[19] Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017. Algorithm: QR-DQN. [paper]
[20] Implicit Quantile Networks for Distributional Reinforcement Learning, Dabney et al, 2018. Algorithm: IQN. [paper]
[21] Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018. Contribution: Introduces Dopamine, a code repository containing implementations of DQN, C51, IQN, and Rainbow. Code link. [paper]
[22] Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop. [paper] [summary]
[23] Action-depedent Control Variates for Policy Optimization via Stein’s Identity, Liu et al, 2017. Algorithm: Stein Control Variates. [paper] [summary]
[24] The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them. [paper]
[39] Exploration by Random Network Distillation, Burda et al, 2018. Algorithm: RND [paper] [summary]
[43] Progressive Neural Networks, Rusu et al, 2016. Algorithm: Progressive Networks [paper] [summary]
[44] Universal Value Function Approximators, Schaul et al, 2015. Algorithm: UVFA. [paper] [summary]
[45] Reinforcement Learning with Unsupervised Auxiliary Tasks, Jaderberg et al, 2016. Algorithm: UNREAL [paper] [summary]
[50] Hindsight Experience Replay, Andrychowicz et al, 2017. Algorithm: Hindsight Experience Replay (HER) [paper] [summary]
[59] Imagination-Augmented Agents for Deep Reinforcement Learning, Weber et al, 2017. Algorithm: I2A [paper] [summary]
[60] Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning, Nagabandi et al, 2017. Algorithm: MBMF [paper] [summary]
[61] Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning, Feinberg et al, 2018. Algorithm: MVE [paper] [summary]
[66] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver et al, 2017. Algorithm: AlphaZero [paper] [summary]
[67] Thinking Fast and Slow with Deep Learning and Tree Search, Anthony et al, 2017. Algorithm: ExIt [paper] [summary]
[68] RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al, 2016. Algorithm: RL^2 [paper] [summary]
[69] Learning to Reinforcement Learn, Wang et al, 2016. [paper] [summary]
[70] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al, 2017. Algorithm: MAML [paper] [summary]