From 0565817d35b497fc2c99dc61bc4fb96843c28fba Mon Sep 17 00:00:00 2001 From: Sasha Abramowitz Date: Wed, 23 Oct 2024 09:41:14 +0200 Subject: [PATCH] feat: update system readmes --- mava/systems/ppo/README.md | 15 +++++++++------ mava/systems/q_learning/README.md | 10 +++++++--- mava/systems/sac/README.md | 12 ++++++++---- 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/mava/systems/ppo/README.md b/mava/systems/ppo/README.md index fc47924fe..fc16b06aa 100644 --- a/mava/systems/ppo/README.md +++ b/mava/systems/ppo/README.md @@ -1,10 +1,13 @@ # Proximal Policy Optimization -todo: links We provide 4 implementations of multi-agent PPO. -* ff-IPPO: feed forward independant PPO -* ff-MAPPO: feed forward multi-agent PPO -* rec-IPPO: recurrent independant PPO -* rec-MAPPO: recurrent multi-agent PPO +* [ff-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_ippo.py): feed forward independant PPO +* [ff-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_mappo.py): feed forward multi-agent PPO +* [rec-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_ippo.py): recurrent independant PPO +* [rec-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_mappo.py): recurrent multi-agent PPO -Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic +Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic. + +## Relevant papers: +* [Single agent Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347) +* [The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games](https://arxiv.org/pdf/2103.01955) diff --git a/mava/systems/q_learning/README.md b/mava/systems/q_learning/README.md index 8c310c639..8e0fe246c 100644 --- a/mava/systems/q_learning/README.md +++ b/mava/systems/q_learning/README.md @@ -1,6 +1,10 @@ # Q Learning -todo: links We provide 2 Q-Learning based systems: -* rec-IQL: a multi-agent recurrent DQN implementation with double DQN. -* rec-QMIX: an implementation of QMIX. +* [rec-IQL](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_iql.py): a multi-agent recurrent DQN implementation with double DQN. +* [rec-QMIX](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_qmix.py): an implementation of QMIX. + +## Relevant papers: +* [Single agent DQN](https://arxiv.org/pdf/1312.5602) +* [Multiagent Cooperation and Competition with Deep Reinforcement Learning](https://arxiv.org/pdf/1511.08779) +* [QMIX](https://arxiv.org/pdf/1803.11485) diff --git a/mava/systems/sac/README.md b/mava/systems/sac/README.md index 2cdfc8904..28fd40e4e 100644 --- a/mava/systems/sac/README.md +++ b/mava/systems/sac/README.md @@ -1,9 +1,13 @@ # Soft Actor Critic -todo: links We provide 3 implementations of multi-agent SAC. -* ff-ISAC: feed forward independant SAC -* ff-MASAC: feed forward multi-agent SAC -* ff-HASAC: recurrent independant SAC +* [ff-ISAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_isac.py): feed forward independant SAC +* [ff-MASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_masac.py): feed forward multi-agent SAC +* [ff-HASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_hasac.py): recurrent independant SAC Where independant SAC uses independant learners and multi-agent SAC uses a CTDE style of training with a centralized critic and HASAC uses heterogenous style, sequential updates. + +## Relevant papers +* [Single agent Soft Actor Critic](https://arxiv.org/pdf/1801.01290) +* [MADDPG](https://arxiv.org/pdf/1706.02275) +* [HASAC](https://arxiv.org/pdf/2306.10715)