IMP-MARL offers a platform for benchmarking the scalability of cooperative MARL methods in real-world engineering applications.
In IMP-MARL, you can:
- Implement your own infrastructure management planning (IMP) environment or execute an available IMP environment.
- Train IMP policies through state-of-the-art MARL methods. The environments can be integrated with typical ecosystems via wrappers.
- Compute expert-based heuristic policies
Additionally, you will be able to:
- Retrieve the results of a benchmark campaign, where MARL methods are assessed in terms of scalability.
- Reproduce our experiments.
This repository has been developed and is maintained by Pascal Leroy & Pablo G. Morato.
Please consider opening an issue or a pull request to help us improve this repository.
To work with our environments, one only needs to install Numpy.
However, to reproduce our results, more packages are required and installation instructions are provided here.
- Create your own IMP environment scenario
- IMP's API explained
- Train agents like in the paper and/or reproduce the results
- Retrieve directly the results
- Train your own MARL agents with PyMarl
- (Correlated and uncorrelated) k-out-of-n system with components subject to fatigue deterioration.
- Offshore wind structural system with components subject to fatigue deterioration.
Note: A campaign cost can be activated in any environment.
- Ready: PyMarl: Multi and single agent wrappers.
- Ready: Gymnasium: Single-agent wrapper.
- Ready: PettingZoo : Multi-agent wrapper.
- Ready: Rllib example: Single-agent training with RLLib and Gymnasium wrapper.
- WIP: MARLlib example: TBD
- WIP: TorchRL example: TBD
To train agents with PyMarl and one of the following algorithms, instructions are available here:
- QMIX: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- QVMIX: QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning
- QPLEX: QPLEX: Duplex Dueling Multi-Agent Q-Learning
- COMA: Counterfactual Multi-Agent Policy Gradients
- FACMAC: Factored Multi-Agent Centralised Policy Gradients
- VDN: Value-Decomposition Networks For Cooperative Multi-Agent Learning
- IQL: Independent Q-Learning
The main code is derived from PyMarl original implementation.
env = Struct({'n_comp': 3,
'discount_reward': 0.95,
'k_comp': 2,
'env_correlation': False,
'campaign_cost': False})
obs, rewards_sum, done = env.reset(), 0, False
while not done:
actions = {f"agent_{i}": random.randint(0,2) for i in range(3)}
obs, rewards, done, insp_outcomes = env.step(actions)
If you use IMP-MARL in your work, please consider citing our paper:
@misc{leroy2023impmarl,
title={IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL},
author={Pascal Leroy and Pablo G. Morato and Jonathan Pisane and Athanasios Kolios and Damien Ernst},
year={2023},
eprint={2306.11551},
archivePrefix={arXiv},
primaryClass={cs.LG}
}