Implementation of the REINFORCEjs library from Kaparthy in Python. The original library has been implemented in JavaScript. The objective of this repository is to implement the RL algorithms and the demos in Python.
Note that this is not a 1-to-1 implementation in Python. The idea is simply trying to develop similar algorithms and demos as shown in Kaparthy's library.
We started by implemented the most trivial algorithm, Value Iteration, from scratch.
The following shows an example of the value function for different iterations.
Value function after |
Value function after |
There are multiple parameters which can be chosen to set when running the main.py
. An example call would look like this:
python main.py \
--seed=42 \
--verbose=1 \
--episodes=1 \
--timesteps=1 \
--grid_size=10 \
--algo=value_iteration \
--render_large=True \
--render_with_values=True
All supported arguments are listed below:
usage:
main.py [--seed] [--verbose] [--episodes] [--timesteps] [--grid_size] [--algo]
[--render_large] [--render_with_values]
Argument | Help | Default |
---|---|---|
--seed |
random seed | |
--verbose |
verbosity level | |
--episodes |
number of episodes | |
--timesteps |
maximal number of timesteps | |
--grid_size |
size of the gridworld | |
--algo |
learning algorithm | value_iteration |
--render_large |
render large gridworld | False |
--render_with_values |
render gridworld with value estimates | False |
Added to docs/changelog.md