Skip to content

Latest commit

 

History

History
43 lines (28 loc) · 1.61 KB

README.md

File metadata and controls

43 lines (28 loc) · 1.61 KB

Reinforcement Learning Models in PyTorch

Description

This repo contans PyTorch implementations of reinforcement learning models for personal skill development.

Background

REINFORCE is a policy gradient method that calculates the policy gradient at the end of every episode and updates the agents parameters accordingly.

Deep Deterministic Policy Gradient (DDPG) is an off-policy, actor-critic policy gradient method. Similar to Deep Q-Learning, a target and current model are used for the actor (policy) and critic (value) functions, and the target model is gradually updated. DDPG also utilizes an experience replay buffer. Losses are computed from the temporal difference error signal.

Dependencies

Results

DDPG

Reward per episode on HalfCheetah-v1

Visualization of learned policy on HalfCheetah-v1

Useful References