Navigation project from Udacity Deep Reinforcement Learning Nanodegree. It demonstrates how to teach an agent to collect yellow bananas while avoiding blue bananas.
- Clone deep reinforcement learning repository
- Fallow the instructions to install necessary dependencies
- Download environment for your system into this repository root
-
Linux: click here
-
Mac OSX: click here
-
Windows (32-bit): click here
-
Windows (64-bit): click here
- Unzip (or decompress) the archive
- Start the jupyter server
- Open the Navigation.ipynb notebook
- Change the kernel to drlnd
- You should be able to run all the cells
This project uses the Unity based environment prepared by the Udacity team.
There is one agent interacting with the environment.
There are 4 actions available to the agent:
- 0 - walk forward
- 1 - walk backward
- 2 - turn left
- 3 - turn right
The state is represented as a vector of 37 dimensions.
There is a reward of +1 for collecting a yellow banana and a reward of -1 for collecting a blue banana.
The directory saves
contains saved weights for 4 different agents:
checkpoint_single_16.pth
- DQNcheckpoint_double_16.pth
- Double DQNcheckpoint_dueling_16.pth
- Dueling Double DQNcheckpoint_priority.pth
- Priority Experience + Dueling Double DQN
Most of the code is based on Deep Q-Networks lesson. The Experience Replay Buffer and SumTree are minimally adapted from Yuan Liu's RainBow implementation.