For this project, I’ve trainned an agent to navigate and collect bananas in a square world using Unit environment. You can find bellow the two conditions: without any trainning and trainned agent.
Each time the agent collect a yellow banana, it’s given a reward of +1. For each blue banana, it received a -1 reward. The goal of the agent is to collect as many yellow bananas as possible and avoid any blue bananas, as it must increase the score given by the amount of rewards received.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions. 4 discrete actions are available, corresponding to:
0
- move forward.1
- move backward.2
- turn left.3
- turn right.
To complete the task the agent must get an average score of 13 over 100 consecutive episodes.
-
Clone this repo.
-
Copy the content of the
p1_navigation/
folder from this repo to thep1_navigation/
folder of the udacity/deep-reinforcement-learning repo and replaces or remove existing files. -
Unzip the Banana_Linux.zip file that is located under the
p1_navigation/
folder under the same directory. If you are not using Linux, follow the instructions on the botton of this file.
Open a jupyter notebook and open the Navigation.ipynb to train or test the agent.
-
For training from zero run all the cells inside the navigation notebook
-
For testing skip the training section and follow the instructions to load the weights.
You need to select the environment that matches your operating system: