Double Deep Q Learning (DDQN) In PyTorch

DDQN inplementation on PLE FlappyBird environment in PyTorch.

DDQN is proposed to solve the overestimation issue of Deep Q Learning (DQN). Apply separate target network to choose action, reducing the correlation of action selection and value evaluation.

Requirement

Python 3.6
Pytorch
Visdom
PLE (PyGame-Learning-Environment)
Moviepy

Algorithm

In this implementation, I update policy network per episode e not per step t.
Simplify input images for faster convergence.

Usage

HyperParameter in config.py
Train

python main.py --train=True --video_path=./video --logs_path=./logs

Restore Pretrain Model

python main.py --restore=./pretrain/model-98500.pth

Visualize loss and reward curve

python -m visdom.server
python visualize.py --logs_path=./logs

Result

Full Video (with 60 FPS)
Reward

Reference

Deep Reinforcement Learning with Double Q-learning
CS 294: Deep Reinforcement Learning
DeepLearningFlappyBird
DQN and Policy Network

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Double Deep Q Learning (DDQN) In PyTorch

Requirement

Algorithm

Usage

Result

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Double Deep Q Learning (DDQN) In PyTorch

Requirement

Algorithm

Usage

Result

Reference