This repository contains implementations of control algorihms found in Sutton&Barto book. Only algorithms presented in boxes are included.
RL algorithms looks simle to implement at first glance. But they are tricky to implement. Subtle errors may result in divergance, not learning and poor performance. The latter is arguably the worst because it is hard to pin point. An algorithm has many reasons to perform poorly apart from the implementation issues e.g. sub-optimal hyper parameter selection, randomness...
I have used sutton&barto book illustrations as a measure of implementation correctness. I tried to replicate the illustrations with relevant algorithms.
The demo package contains demonstrations of the algorithms. The env package contains gym environments used in demonstrations.
- On Policy First Visit MC
- Off Policy MC
- SARSA
- Q-Learning
- Double Q-Learning
- Expected Sarsa
- n-Step SARSA
- Off Policy n-Step SARSA
- n-Step Tree Backup
- Off Policy n-Step Q Sigma
- Tabular DynaQ
- Differential Semi-Gradient SARSA
- Differential Semi-Gradient n-Step SARSA
- Episodic Semi-Gradient SARSA
- Episodic Semi-Gradient n-Step SARSA
- Binary Feature Linear Function Approximation SARSA
- REINFORCE Softmax Linear MC
- REINFORCE Softmax Linear MC with Baseline
- 1-Step Actor Critic
- Episodic Eligibility Trace Actor Critic
- Continuing Eligibility Trace Actor Critic
- Sutton&Barto Figure 5.2 - Blackjack - Monte Carlo Agent - 6M episodes
- Blackjack - Expected SARSA Agent - 6M episodes
- Sutton&Barto Figure 10.1 - Episodic Semi-Gradient 5-Step Sarsa Agent
- Sutton&Barto Figure 10.2 - Episodic Semi-Gradient Sarsa Agent with different alpha parameters
- Sutton&Barto Figure 10.3 - Episodic Semi-Gradient n-Step Sarsa Agent with different different step sizes
- Sutton&Barto Figure 10.4 - Episodic Semi-Gradient n-Step Sarsa Agent with different step sizes and alpha values
- Sutton&Barto Figure 12.10 - Binary Feature Linear Function Approximation SARSA Agent with different lambda and alpha values
- Sutton&Barto Figure 10.5 - Episodic Semi-Gradient 5-Step Sarsa Agent
- Sutton&Barto Figure 13.1 - Total reward on episode - Reinforce softmax linear MC Agent
- Sutton&Barto Figure 13.2 - Total reward on episode - Reinforce softmax linear MC Agent with Baseline
- Total reward on episode - Actor Critic