Average Reward Deep RL

This implementation serves as the reference code for the paper RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning, authored by the same researchers. Unlike the commonly used discounted reward sum, RVI-SAC employs average reward as its objective, as shown below (precisely, the objective includes entropy; for more details, please refer to the paper).

$$ \rho^\pi := \lim_{T \rightarrow \infty} \frac{1}{T} E_\pi [\sum_{t=0}^T R_t] $$

Average reward is a more natural objective than the discounted reward sum for continuing tasks (e.g., locomotion tasks) where episodes continue indefinitely. By utilizing the average reward instead of the discounted reward, performance improvements can be expected. Our algorithm, RVI-SAC, is a novel method that combines average reward with Soft Actor-Critic.

This research has been accepted at ICML 2024.

Installation

Prerequisites

Make sure you have poetry installed on your system. If you don't have it yet, you can install it by following the instructions here.

Setting up the Environment

Run the following command to set up the environment using poetry.

poetry install

Implemented Algorithms

(proposal) RVI-SAC
Soft Actor-Critic (Original Implementation: here)
ARO-DDPG (Original Implementation: here)

Run

Hyperparameters are managed by hydra. See config.yaml for details.

poetry run python3 experiments/main.py \
  algo=rvi_sac \
  env=Ant-v4 \
  seed=0

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
average_reward_drl		average_reward_drl
experiments		experiments
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Average Reward Deep RL

Installation

Prerequisites

Setting up the Environment

Implemented Algorithms

Run

Results of MuJoCo Experiments

Related Links

About

Releases

Packages

Languages

yhisaki/average-reward-drl

Folders and files

Latest commit

History

Repository files navigation

Average Reward Deep RL

Installation

Prerequisites

Setting up the Environment

Implemented Algorithms

Run

Results of MuJoCo Experiments

Related Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages