Generative replay addition to TD3

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3) with a generative replay component.

The code is heavily modified to work for my research needs

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.7 and Python 3.8.