- Use this code to replicate the results from the paper, for a more readable TF 2.x implementation check out tf2multiagentrl.
This is the implemetation of MATD3, presented in our paper Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics. Multi-Agent TD3 is an algorithm for multi-agent reinforcement learning, that combines the improvements of TD3 with MADDPG.
The implementation here is closely based on maddpg from Ryan Lowe / OpenAI, to enable a fair comparision. The environments used are from multiagent-particle-envs from OpenAI.
python == 3.6
TF == 1.12.0
any 1.x should workGym == 0.10.5
this one is importantNumpy >= 1.16.2
To start training on simple_crypto, with an MATD3 team of agents and an MADDPG adversary, use
python train.py --scenario simple_speaker_listener --good-policy matd3 --adv-policy maddpg
If you use our implementation, please also cite our paper with
@misc{ackermann2019reducing,
title={Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics},
author={Johannes Ackermann and Volker Gabler and Takayuki Osa and Masashi Sugiyama},
year={2019},
eprint={1910.01465},
archivePrefix={arXiv},
primaryClass={cs.LG}
}