Q-Learning

Problem

MountainCar-v0 is a Gym environment that provides us a simple environment that can take action as input and by using step() method, returns a new state, reward and whether the goal is reached or not. Here is an screenshot of the initial environment.

Initial State of The Environment

In this project, aim is to implement a Q-Learning algorithm in the first phase, and also develope a deep Q-Learning algorithm using Keras.

Phase one

This is the first phase of the project that focuses on training the car to reach the peak by updating the Q-Table. Here we use the Bellman equation as a simple value iteration update.

Scatter Plot of The Q-Table at 1000 Episodes of Training

Scatter Plot of The Q-Table at 60000 Episodes of Training

As it can be understood from the first scatter graph, the green dots represent the action taken by the agent in the previous action. In the plot, the agent chooses a action which is almost random. But after some iterations through episodes, the agent learns that taking specific actions is specific locations, leads it to a receive a reward! When the agent understands the solution, It keeps exploiting it. Ofcourse it can lead the model to gain reward constantly, But the model doesn't know that it can gain more reward by taking different actions, which is known as exploration. This the one of the foundamental problems of reinforcement learning, known as Exploration Explotation Dilemma.

Result (Phase one)

Not Using Epsilon Decay

As it's obvious, the car at episode 1, has no idea what to do. But after only 500 episodes it understands that by making progress to the right, he'll gain a point! But there is an interesting point there: Although the car receives its reward by reaching the peak, as it's shown in the gif, it tried to minimize the spent time. To do that, at first it decreases amount of the path in goes forward and uses its gained velocity to reach the top!

Using Epsilon Decay

Although the car reaches the peak in a quite acceptable time, by using epsilon decay we make model to explore more in order to find a better approach! And as it's shown in above gif, the car minimizes it's spent time to reach the peak.

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 1000 - Use epsilon Decay: True - EPSILON: 0.5

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 2000 - Use epsilon Decay: True - EPSILON: 0.5

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 5000 - Use epsilon Decay: True - EPSILON: 0.5

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 10000 - Use epsilon Decay: True - EPSILON: 0.5

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 15000 - Use epsilon Decay: True - EPSILON: 1

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 40000 - Use epsilon Decay: True - EPSILON: 0.5

LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 50000 - Use epsilon Decay: True - EPSILON: 0.5

Final Result

This is the final result gained by training the model for 60000 EPISODES:

Final Trained Model After 60000 Episodes

How to use:

First clone the repository:

$ git clone https://github.com/FarzamTP/Q-Learning-Mountain-Car.git
$ cd Deep-Q-Learning-Car-Mountain

To setup the virtual environment and activating it:

$ python3 -m venv venv
$ source venv/bin/activate

And to install the requirements:

(venv)$ pip3 install -r requirements.txt

and run the main.py script:

(venv)$ python3 main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Q-Learning

Problem

Phase one

Result (Phase one)

Not Using Epsilon Decay

Using Epsilon Decay

Final Result

How to use:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Q-Learning

Problem

Phase one

Result (Phase one)

Not Using Epsilon Decay

Using Epsilon Decay

Final Result

How to use: