Skip to content

Implementation of Double DQN reinforcement learning for OpenAI Gym environments with PyTorch.

License

Notifications You must be signed in to change notification settings

fschur/DDQN-with-PyTorch-for-OpenAI-Gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DDQN with PyTorch for OpenAI Gym

Implementation of Double DQN reinforcement learning for OpenAI Gym environments with discrete action spaces. Performance is defined as the sample efficiency of the algorithm i.e. how good is the average reward after using x episodes of interaction in the environment for training.
The related paper can be found here: Hasselt, 2010

Double DQN

The standard DQN method has been shown to overestimate the true Q-value, because for the target an argmax over estimated Q-values is used. Therefore when some values are overestimated and some underestimated, the overestimated values have a higher probability to be selected.

Standard DQN target:
Q(st, at) = rt + Q(st+1, argmaxaQ(st, a))

By using two uncorralated Q-Networks we can prevent this overestimation. In order to save computation time we do gradient updates only for one of the Q-Networks and periodically update the parameters of the target Q-Network to match the parameter of the Q-Network that is updated.

The Double DQN target then becomes:
Q(st, at) = rt + Qθ(st+1, argmaxaQtarget(st, a))

And the loss function is given by:
(Q(st, at) - Qθ(st, at))^2

About

Implementation of Double DQN reinforcement learning for OpenAI Gym environments with PyTorch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages