Skip to content

yhisaki/average-reward-drl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Average Reward Deep RL

arXiv

This implementation serves as the reference code for the paper RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning, authored by the same researchers. Unlike the commonly used discounted reward sum, RVI-SAC employs average reward as its objective, as shown below (precisely, the objective includes entropy; for more details, please refer to the paper).

$$ \rho^\pi := \lim_{T \rightarrow \infty} \frac{1}{T} E_\pi [\sum_{t=0}^T R_t] $$

Average reward is a more natural objective than the discounted reward sum for continuing tasks (e.g., locomotion tasks) where episodes continue indefinitely. By utilizing the average reward instead of the discounted reward, performance improvements can be expected. Our algorithm, RVI-SAC, is a novel method that combines average reward with Soft Actor-Critic.

This research has been accepted at ICML 2024.

Installation

Prerequisites

  • Make sure you have poetry installed on your system. If you don't have it yet, you can install it by following the instructions here.

Setting up the Environment

Run the following command to set up the environment using poetry.

poetry install

Implemented Algorithms

Run

Hyperparameters are managed by hydra. See config.yaml for details.

poetry run python3 experiments/main.py \
  algo=rvi_sac \
  env=Ant-v4 \
  seed=0

Results of MuJoCo Experiments

Related Links