A Talent-infused Policy-gradient Approach to Efficient Co-Design of Morphology and Task Allocation Behavior of Multi-Robot Systems
This paper proposes a computational framework that enables co-optimization of morphology and behavior of individual robots in a swarm to achieve maximum performance from its emergent behavior while also helping explore the benefits of swarm systems. Here, we utilize our previously proposed artificial-life-inspired talent metrics that are physical quantities of interest, reflective of the capabilities of an individual robotic system. Talent metrics represent a compact yet physically interpretable parametric space that connects the behavior space and morphology space. We use this to decompose the morphology-behavior co-optimization into a sequence of talent-behavior optimization problems that can effectively reduce the overall search space (for each individual problem) without negligible compromise in the ability to find optimal solutions. In other words, the decomposition approach presented here is nearly lossless, i.e., a solution that can be found otherwise with a brute-force nested optimization approach to co-design will also exist in the overall search space spanned by our decomposed co-design approach (albeit assuming that each search process is ideal). We also propose a novel talent-infused policy gradient method to concurrently optimize the talents and learn the behavior. The framework consists of 4 steps: a) Initally Morphology and its dependent talent parameters are derived, b) Based on the talents, we create a Pareto front by solving multi-objective optimization, c) The Talent-infused policy-gradient method is used to train the associated behavior and talents, d) Finalize the morphology. The below flowchart explains the four steps involved in our codesign process.
In this work, we specifically focus on multi-robot disaster response problem, which we refer to as MRTA-Flood. It consists of
The MRTA-Flood problems involve a set of nodes/vertices (
State Space (
Action Space (
Reward (
Transition: The transition is an event-based trigger. An event is defined as the condition that a robot reaches its selected task or visits the depot location. Since here we do not consider any uncertainty, the state transition probability is 1.
run training_mrta.py