Skip to content

Commit

Permalink
feat: update system readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
sash-a committed Oct 23, 2024
1 parent 3655c73 commit 0565817
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 13 deletions.
15 changes: 9 additions & 6 deletions mava/systems/ppo/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Proximal Policy Optimization
todo: links

We provide 4 implementations of multi-agent PPO.
* ff-IPPO: feed forward independant PPO
* ff-MAPPO: feed forward multi-agent PPO
* rec-IPPO: recurrent independant PPO
* rec-MAPPO: recurrent multi-agent PPO
* [ff-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_ippo.py): feed forward independant PPO
* [ff-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_mappo.py): feed forward multi-agent PPO
* [rec-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_ippo.py): recurrent independant PPO
* [rec-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_mappo.py): recurrent multi-agent PPO

Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic
Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic.

## Relevant papers:
* [Single agent Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347)
* [The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games](https://arxiv.org/pdf/2103.01955)
10 changes: 7 additions & 3 deletions mava/systems/q_learning/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Q Learning
todo: links

We provide 2 Q-Learning based systems:
* rec-IQL: a multi-agent recurrent DQN implementation with double DQN.
* rec-QMIX: an implementation of QMIX.
* [rec-IQL](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_iql.py): a multi-agent recurrent DQN implementation with double DQN.
* [rec-QMIX](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_qmix.py): an implementation of QMIX.

## Relevant papers:
* [Single agent DQN](https://arxiv.org/pdf/1312.5602)
* [Multiagent Cooperation and Competition with Deep Reinforcement Learning](https://arxiv.org/pdf/1511.08779)
* [QMIX](https://arxiv.org/pdf/1803.11485)
12 changes: 8 additions & 4 deletions mava/systems/sac/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Soft Actor Critic
todo: links

We provide 3 implementations of multi-agent SAC.
* ff-ISAC: feed forward independant SAC
* ff-MASAC: feed forward multi-agent SAC
* ff-HASAC: recurrent independant SAC
* [ff-ISAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_isac.py): feed forward independant SAC
* [ff-MASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_masac.py): feed forward multi-agent SAC
* [ff-HASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_hasac.py): recurrent independant SAC

Where independant SAC uses independant learners and multi-agent SAC uses a CTDE style of training with a centralized critic and HASAC uses heterogenous style, sequential updates.

## Relevant papers
* [Single agent Soft Actor Critic](https://arxiv.org/pdf/1801.01290)
* [MADDPG](https://arxiv.org/pdf/1706.02275)
* [HASAC](https://arxiv.org/pdf/2306.10715)

0 comments on commit 0565817

Please sign in to comment.