DDQN inplementation on PLE FlappyBird environment in PyTorch.
DDQN is proposed to solve the overestimation issue of Deep Q Learning (DQN). Apply separate target network to choose action, reducing the correlation of action selection and value evaluation.
- Python 3.6
- Pytorch
- Visdom
- PLE (PyGame-Learning-Environment)
- Moviepy
- In this implementation, I update policy network per episode e not per step t.
- Simplify input images for faster convergence.
- HyperParameter in
config.py
- Train
python main.py --train=True --video_path=./video --logs_path=./logs
- Restore Pretrain Model
python main.py --restore=./pretrain/model-98500.pth
- Visualize loss and reward curve
python -m visdom.server
python visualize.py --logs_path=./logs
- Full Video (with 60 FPS)
- Reward