HRI-EU fork of the level-based foraging (LBF) multi-agent reinforcement learning environment
- About The Project
- Getting Started
- Using the LBF environment
- Running cooperation experiments
- Please Cite
- Contributing
- Contact
This environment is a mixed cooperative-competitive game, which focuses on the coordination of the agents involved. Agents navigate a grid world and collect food by cooperating with other agents if needed. This fork implements a new set of heuristic agents with varying capability of cooperative behavior.
Agents are placed in the grid world, and each is assigned a level. Food is also randomly scattered, each having a level on its own. Agents can navigate the environment and can attempt to collect food placed next to them. The collection of food is successful only if the sum of the levels of the agents involved in loading is equal to or higher than the level of the food. Finally, agents are awarded points equal to the level of the food they helped collect, divided by their contribution (their level). The figures below show two states of the game, one that requires cooperation, and one more competitive.
While it may appear simple, this is a very challenging environment, requiring the cooperation of multiple agents while being competitive at the same time. In addition, the discount factor also necessitates speed for the maximisation of rewards. Each agent is only awarded points if it participates in the collection of food, and it has to balance between collecting low-levelled food on his own or cooperating in acquiring higher rewards. In situations with three or more agents, highly strategic decisions can be required, involving agents needing to choose with whom to cooperate. Another significant difficulty for RL algorithms is the sparsity of rewards, which causes slower learning.
This is a Python simulator for level based foraging. It is based on OpenAI's RL framework, with modifications for the multi-agent domain. The efficient implementation allows for thousands of simulation steps per second on a single thread, while the rendering capabilities allows humans to visualise agent actions. Our implementation can support different grid sizes or agent/food count. Also, game variants are implemented, such as cooperative mode (agents always need to cooperate) and shared reward (all agents always get the same reward), which is attractive as a credit assignment problem.
This fork allows to control the degree of required cooperation in the environment by setting the level of a specified fraction of food items such that the item can only be collected by two agents jointly. A further parameter can introduce distractor items, i.e., food that can neither be collected individually or by both agents jointly. Additionally, the fork implements agent heuristics of varying cooperative ability, ranging from agents behaving randomly or purely egoistic to purely cooperative. See the below section on cooperation experiments for details.
Install using pip
pip install lbforaging
Or to ensure that you have the latest version:
git clone https://github.com/semitable/lb-foraging.git
cd lb-foraging
pip install -e .
For an example on how to run the environment call
python lbf_heuristic_agents.py
in the experiments
folder. This will start one game episode using a heuristic
agent, specified in config/settings.yml
.
Create environments with the gym framework. First import
import lbforaging
Then create an environment:
env = gym.make("Foraging-8x8-2p-1f-v2")
We offer a variety of environments using this template:
"Foraging-{GRID_SIZE}x{GRID_SIZE}-{PLAYER COUNT}p-{FOOD LOCATIONS}f{-coop IF COOPERATIVE MODE}-v0"
But you can register your own variation using (change parameters as needed):
from gym.envs.registration register
register(
id="Foraging-{0}x{0}-{1}p-{2}f{3}-v2".format(s, p, f, "-coop" if c else ""),
entry_point="lbforaging.foraging:ForagingEnv",
kwargs={
"players": p,
"field_size": (s, s),
"max_food": f,
"sight": s,
"max_episode_steps": 50,
"force_coop": c,
},
)
Similarly to Gym, but adapted to multi-agent settings step() function is defined as
nobs, nreward, ndone, ninfo = env.step(actions)
Where n-obs, n-rewards, n-done and n-info are LISTS of N items (where N is the number of agents). The i'th element of each list should be assigned to the i'th agent.
actions is a LIST of N INTEGERS (one of each agent) that should be executed in that step. The integers should correspond to the Enum below:
class Action(Enum):
NONE = 0
NORTH = 1
SOUTH = 2
WEST = 3
EAST = 4
LOAD = 5
Valid actions can always be sampled like in a gym environment, using:
env.action_space.sample() # [2, 3, 0, 1]
Also, ALL actions are valid. If an agent cannot move to a location or load, his action will be replaced with NONE
automatically.
The rewards are calculated as follows. When one or more agents load a food, the food level is rewarded to the agents weighted with the level of each agent. Then the reward is normalised so that at the end, the sum of the rewards (if all foods have been picked-up) is one. If you prefer code:
for a in adj_players: # the players that participated in loading the food
a.reward = float(a.level * food) # higher-leveled agents contribute more and are rewarded more.
if self._normalize_reward:
a.reward = a.reward / float(
adj_player_level * self._food_spawned
) # normalize reward so that the final sum of rewards is one.
To reproduce experiments that were used to generate results shown in the
accompanying publication, in the experiments
folder run
./run_experiments_for_paper.sh
This fork provides additional code to run experiments in the LBF environment using
agents of different cooperative abilities and environments that require different
degrees of cooperation. The required degree of cooperation is specified by the fraction
of food items that can only be collected jointly by both agents,
In the experiments
folder run
python run_experiments.py --settings ../config/settings.yml --outpath "../lbf_experiments/"
The settings file, config/settings.yml
specifies the experimental setup. It can contain the following options
experiment
:
-
heuristics
: list of agent to be heuristics used (see next section), currently only agents with the same heuristic can be paired -
coop_min
,coop_max
,coop_step
: defines different degrees of required cooperation in an environment, required degree of cooperation is defined as the fraction of food items$c$ that can only be collected jointly -
ntrials
: number of trials per combination of heuristic and$c$
environment
:
size
: edge length of grid worldsight
: agent sight, ifsight
==size
, agents can see the whole environmentnplayers
: number of agentsnfood
: number of food itemsthresh_respawn_food
: number of remaining food items on the field that trigger the pawning of new food items, if -1, no respawndistractors
: fraction of distractors at initialization, distractors are items that can not even be picked up jointly by both agentsmax_episode_steps
: maximum number of steps after which an episode is terminated if respawning is selected
agents
:
patience
: number of maximum loading attempts by an agent before disregarding a targetmemory
: number of steps remembered by an agentlevels
: levels of each agent, e.g., [1, 1]heuristic
: agent heuristic, can beH1
,H2
,H3
,H4
,H5
for heuristics described in Albrecht, S. V., & Ramamoorthy, S., 2015, arXiv preprint,MultiHeuristicAgent
for heuristic specified by a set of abilitiesabilities
: abilities ofMultiHeuristicAgent
, not used if heuristicsH1
-H4
are chosen
By settings thresh_respawn_food
to a value greater 0, new food items are spawned if
when only the specified number of items remains on the field. This setting allows to run
episodes for an arbitrary number of steps, specified in max_episode_steps
(if no new
food is spawned, the episode terminates when all items are collected).
The repository contains agent heuristics H1
-H4
described in the original publication by Albrecht and Ramamoorthy (Albrecht, S. V., & Ramamoorthy, S., 2015, arXiv preprint).
This fork additionally implements the MultiHeuristicAgent
class that allows to define agent behavior by specifying a
set of abilities. Based on this class, a set of heuristics is specified using sets ob abilities defined in
experiments/mh_agent_configurations
:
BASELINE
: takes random steps, attempts to load if next to a food item (shows behavior similar to formerRandom
agent)EGOISTIC
: takes steps towards the closest goal compatible with own level (formerH1
)SOCIAL1
: takes steps towards goal closest to Euclidean center of all agents, irrespective of goal level (formerH2
)SOCIAL2
: takes steps towards goal closest to Euclidean center of all agents and with compatible level (formerH4
)COOPERATIVE
: uses a goal value function, to choose a goal it can collect jointly with the second agent (similar behavior toSOCIAL2
)ADAPTIVE
: uses a goal value function to decide whether to act cooperatively or egoistically
The goal value function assigns a value to every food item within an agent's sight. The value is calculated by the items level/value divided by the distance between the agent and the item. If an item can only be collected jointly, its value is divided by the number of agents jointly collecting it (2).
When running LBF experiments via
python run_experiments.py --settings ../config/settings.yml --outpath "../lbf_experiments/"
the generated game data is saved to the specified folder as a csv. Saved game data comprises the following variables for later analysis:
agent_id
: agent ID, agent actions are concatenatedstep
: episode iterations the data is collected forcoord_x
: agent's x-coordinatecoord_y
: agent's y-coordinatereward
: environment rewardreward_sum
: cumulative environment rewardcooperative_actions
: whether a cooperative action was performed in this step (joint collection of food item)food
: agent's share of the value of collected food item, 0 if nothing was collectedfood_type
: whether collected food item required cooperation or notfood_sum
: cumulative food valueaction
: agent's actiongoal_value_ego
: highest value of a goal the agent could have collected individuallygoal_value_other
: highest value of a goal the other agent could have collected individuallygoal_value_together
: highest value of a goal the agent's could have collected jointlydist_closest_food
: distance to closest food itemdist_closest_agent
: distance to closest agent
- The paper that first uses this implementation of Level-based Foraging (LBF) and achieves state-of-the-art results:
@inproceedings{christianos2020shared,
title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning},
author={Christianos, Filippos and Schäfer, Lukas and Albrecht, Stefano V},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year={2020}
}
- A comperative evaluation of cooperative MARL algorithms and includes an introduction to this environment:
@inproceedings{papoudakis2021benchmarking,
title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
author={Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2021},
openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
}
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Filippos Christianos - f.christianos@ed.ac.uk
Project Link: https://github.com/semitable/lb-foraging
Patricia Wollstadt - patricia.wollstadt@honda-ri.de
Christiane Wiebel-Herboth - christiane.wiebel@honda-ri.de
Matti Krüger - matti.krueger@honda-ri.de