feat: update system readmes

instadeepai · Oct 23, 2024 · 0565817 · 0565817
1 parent 3655c73
commit 0565817
Show file tree

Hide file tree

Showing 3 changed files with 24 additions and 13 deletions.
diff --git a/mava/systems/ppo/README.md b/mava/systems/ppo/README.md
@@ -1,10 +1,13 @@
 # Proximal Policy Optimization
-todo: links
 
 We provide 4 implementations of multi-agent PPO.
-* ff-IPPO: feed forward independant PPO
-* ff-MAPPO: feed forward multi-agent PPO
-* rec-IPPO: recurrent independant PPO
-* rec-MAPPO: recurrent multi-agent PPO
+* [ff-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_ippo.py): feed forward independant PPO
+* [ff-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/ff_mappo.py): feed forward multi-agent PPO
+* [rec-IPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_ippo.py): recurrent independant PPO
+* [rec-MAPPO](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/ppo/anakin/rec_mappo.py): recurrent multi-agent PPO
 
-Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic
+Where independant PPO uses independant learners and multi-agent PPO uses a CTDE style of training with a centralized critic.
+
+## Relevant papers:
+* [Single agent Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347)
+* [The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games](https://arxiv.org/pdf/2103.01955)
diff --git a/mava/systems/q_learning/README.md b/mava/systems/q_learning/README.md
@@ -1,6 +1,10 @@
 # Q Learning
-todo: links
 
 We provide 2 Q-Learning based systems:
-* rec-IQL: a multi-agent recurrent DQN implementation with double DQN.
-* rec-QMIX: an implementation of QMIX.
+* [rec-IQL](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_iql.py): a multi-agent recurrent DQN implementation with double DQN.
+* [rec-QMIX](https://github.com/instadeepai/Mava/tree/feat/develop/mava/systems/q_learning/anakin/rec_qmix.py): an implementation of QMIX.
+
+## Relevant papers:
+* [Single agent DQN](https://arxiv.org/pdf/1312.5602)
+* [Multiagent Cooperation and Competition with Deep Reinforcement Learning](https://arxiv.org/pdf/1511.08779)
+* [QMIX](https://arxiv.org/pdf/1803.11485)
diff --git a/mava/systems/sac/README.md b/mava/systems/sac/README.md
@@ -1,9 +1,13 @@
 # Soft Actor Critic
-todo: links
 
 We provide 3 implementations of multi-agent SAC.
-* ff-ISAC: feed forward independant SAC
-* ff-MASAC: feed forward multi-agent SAC
-* ff-HASAC: recurrent independant SAC
+* [ff-ISAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_isac.py): feed forward independant SAC
+* [ff-MASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_masac.py): feed forward multi-agent SAC
+* [ff-HASAC](https://github.com/instadeepai/Mava/blob/feat/develop/mava/systems/sac/anakin/ff_hasac.py): recurrent independant SAC
 
 Where independant SAC uses independant learners and multi-agent SAC uses a CTDE style of training with a centralized critic and HASAC uses heterogenous style, sequential updates.
+
+## Relevant papers
+* [Single agent Soft Actor Critic](https://arxiv.org/pdf/1801.01290)
+* [MADDPG](https://arxiv.org/pdf/1706.02275)
+* [HASAC](https://arxiv.org/pdf/2306.10715)