Skip to content

Latest commit

 

History

History
143 lines (114 loc) · 6 KB

README.md

File metadata and controls

143 lines (114 loc) · 6 KB

Q-Learning

License: GPL v3

Problem

MountainCar-v0 is a Gym environment that provides us a simple environment that can take action as input and by using step() method, returns a new state, reward and whether the goal is reached or not. Here is an screenshot of the initial environment.

initial_state_of_car

Initial State of The Environment

In this project, aim is to implement a Q-Learning algorithm in the first phase, and also develope a deep Q-Learning algorithm using Keras.

Phase one

This is the first phase of the project that focuses on training the car to reach the peak by updating the Q-Table. Here we use the Bellman equation as a simple value iteration update.

Bellman equation

Initial Q-Table

Scatter Plot of The Q-Table at 1000 Episodes of Training

Final Q-Table

Scatter Plot of The Q-Table at 60000 Episodes of Training

As it can be understood from the first scatter graph, the green dots represent the action taken by the agent in the previous action. In the plot, the agent chooses a action which is almost random. But after some iterations through episodes, the agent learns that taking specific actions is specific locations, leads it to a receive a reward! When the agent understands the solution, It keeps exploiting it. Ofcourse it can lead the model to gain reward constantly, But the model doesn't know that it can gain more reward by taking different actions, which is known as exploration. This the one of the foundamental problems of reinforcement learning, known as Exploration Explotation Dilemma.

Result (Phase one)

Not Using Epsilon Decay

Not trained gif

As it's obvious, the car at episode 1, has no idea what to do. But after only 500 episodes it understands that by making progress to the right, he'll gain a point! But there is an interesting point there: Although the car receives its reward by reaching the peak, as it's shown in the gif, it tried to minimize the spent time. To do that, at first it decreases amount of the path in goes forward and uses its gained velocity to reach the top!

Using Epsilon Decay

trained gif

Although the car reaches the peak in a quite acceptable time, by using epsilon decay we make model to explore more in order to find a better approach! And as it's shown in above gif, the car minimizes it's spent time to reach the peak.

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 1000 - Use epsilon Decay: True - EPSILON: 0.5

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 2000 - Use epsilon Decay: True - EPSILON: 0.5

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 5000 - Use epsilon Decay: True - EPSILON: 0.5

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 10000 - Use epsilon Decay: True - EPSILON: 0.5

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 15000 - Use epsilon Decay: True - EPSILON: 1

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 40000 - Use epsilon Decay: True - EPSILON: 0.5

trained gif

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 50000 - Use epsilon Decay: True - EPSILON: 0.5

Final Result

  • This is the final result gained by training the model for 60000 EPISODES:

Final Trained Model After 60000 Episodes

How to use:

First clone the repository:

$ git clone https://github.com/FarzamTP/Q-Learning-Mountain-Car.git
$ cd Deep-Q-Learning-Car-Mountain

To setup the virtual environment and activating it:

$ python3 -m venv venv
$ source venv/bin/activate

And to install the requirements:

(venv)$ pip3 install -r requirements.txt

and run the main.py script:

(venv)$ python3 main.py