MountainCar-v0
is a Gym
environment that provides us a simple environment that can take action as input and by using step()
method, returns a new state, reward and whether the goal is reached or not.
Here is an screenshot of the initial environment.
Initial State of The Environment
In this project, aim is to implement a Q-Learning algorithm in the first phase, and also develope a deep Q-Learning algorithm using Keras
.
This is the first phase of the project that focuses on training the car to reach the peak by updating the Q-Table. Here we use the Bellman equation as a simple value iteration update.
Scatter Plot of The Q-Table at 1000 Episodes of Training
Scatter Plot of The Q-Table at 60000 Episodes of Training
As it can be understood from the first scatter graph, the green dots represent the action taken by the agent in the previous action. In the plot, the agent chooses a action which is almost random. But after some iterations through episodes, the agent learns that taking specific actions is specific locations, leads it to a receive a reward! When the agent understands the solution, It keeps exploiting it. Ofcourse it can lead the model to gain reward constantly, But the model doesn't know that it can gain more reward by taking different actions, which is known as exploration. This the one of the foundamental problems of reinforcement learning, known as Exploration Explotation Dilemma.
As it's obvious, the car at episode 1, has no idea what to do. But after only 500 episodes it understands that by making progress to the right, he'll gain a point! But there is an interesting point there: Although the car receives its reward by reaching the peak, as it's shown in the gif, it tried to minimize the spent time. To do that, at first it decreases amount of the path in goes forward and uses its gained velocity to reach the top!
Although the car reaches the peak in a quite acceptable time, by using epsilon decay we make model to explore more in order to find a better approach! And as it's shown in above gif, the car minimizes it's spent time to reach the peak.
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 1000 - Use epsilon Decay: True - EPSILON: 0.5
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 2000 - Use epsilon Decay: True - EPSILON: 0.5
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 5000 - Use epsilon Decay: True - EPSILON: 0.5
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 10000 - Use epsilon Decay: True - EPSILON: 0.5
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 15000 - Use epsilon Decay: True - EPSILON: 1
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 40000 - Use epsilon Decay: True - EPSILON: 0.5
LR: 0.1 - DISCOUNT: 0.95 - EPISODES: 50000 - Use epsilon Decay: True - EPSILON: 0.5
- This is the final result gained by training the model for 60000 EPISODES:
Final Trained Model After 60000 Episodes
First clone the repository:
$ git clone https://github.com/FarzamTP/Q-Learning-Mountain-Car.git
$ cd Deep-Q-Learning-Car-Mountain
To setup the virtual environment
and activating
it:
$ python3 -m venv venv
$ source venv/bin/activate
And to install the requirements:
(venv)$ pip3 install -r requirements.txt
and run the main.py
script:
(venv)$ python3 main.py