Skip to content

Latest commit

 

History

History
103 lines (81 loc) · 6.83 KB

README.md

File metadata and controls

103 lines (81 loc) · 6.83 KB

Python 3.11.7 PettingZoo version dependency Code style: black Wiki


The environments

so_predpregrass_v0.py: A (single-objective) multi-agent reinforcement learning (MARL) environment, trained and evaluated using Proximal Policy Optimization (PPO). Learning agents Predators (red) and Prey (blue) both expend energy moving around, and replenish it by eating. Prey eat Grass (green), and Predators eat Prey if they end up on the same grid cell. In the base case for simplicity, the agents obtain all the energy from the eaten Prey or Grass. Predators die of starvation when their energy is zero, Prey die either of starvation or when being eaten by a Predator. The agents asexually reproduce when energy levels of learning agents rise above a certain treshold by eating. Learning agents, learn to execute movement actions based on their partial observations (transparent red and blue squares respectively) of the environment to maximize cumulative reward. The single objective rewards (fore stepping, eating, dying and reproducing) are simply added and can be adjusted in the environment configuration file.

mo_predpregrass_v0.py: A (multi-objective) multi-agent reinforcement learning (MOMARL) environment. The envrionment has two objectives: 1) maximize cumulative rewards for reproduction of Predator agents and 2) maximize cumulative rewards for reproduction of Prey agents. The rewards returned by the environment are stored in a two-dimensional vector in accordance with Farama's Momaland framework.



Emergent Behaviors

The PPO algorithm is an example of how elaborate behaviors can emerge from simple rules in agent-based models. In the above displayed MARL example, rewards for learning agents are only obtained by reproduction. However, maximizing these rewards results in emerging behaviors such as: 1) Predators hunting Prey, 2) Predators hovering around grass to catch Prey and 3) Prey trying to escape Predators. These emerging behaviors lead to more complex dynamics at the ecosystem level. The trained agents are displaying a classic Lotka–Volterra pattern over time. This learned outcome is not obtained with a random policy:

More emergent behavior and findings are described on our website.

Installation

Editor used: Visual Studio Code 1.93.1 on Linux Mint 21.3 Cinnamon

  1. Clone the repository:
    git clone https://github.com/doesburg11/PredPreyGrass.git
  2. Open Visual Studio Code and execute:
    • Press ctrl+shift+p
    • Type and choose: "Python: Create Environment..."
    • Choose environment: Conda
    • Choose interpreter: Python 3.11.7
    • Open a new terminal
    • Install dependencies:
    pip install -r requirements.txt
  3. If encountering "ERROR: Failed building wheel for box2d-py," run:
    conda install swig
    and
    pip install box2d box2d-kengz
  4. Alternative 1:
    pip install wheel setuptools pip --upgrade
    pip install swig
    pip install gymnasium[box2d]
  5. Alternative 2: a workaround is to copy Box2d files from assets/box2d to the site-packages directory.
  6. If facing "libGL error: failed to load driver: swrast," execute:
    conda install -c conda-forge gcc=12.1.0
    

Getting started

Visualize a random policy

In Visual Studio Code run: pettingzoo/predpreygrass/random_policy.py

Training and visualize trained model using PPO from stable baselines3

Adjust parameters accordingly in: predpreygrass/envs/_predpreygrass_v0/config/config_predpreygrass.py In Visual Studio Code run: predpreygrass/optimizationspredpreygrass/train_predpreygrass_v0_ppo.py To evaluate and visualize after training follow instructions in: predpreygrass/optimizationspredpreygrass/evaluate_from_file.py

Configuration of the PredPreyGrass environment

The benchmark configuration used in the gif-video above.

References