06‐25‐2024 Weekly Tag Up

Attendees

Trained 3 different models on sumo environment
- Used 9 agent env
- Self-play (actor-critic with parameter sharing)
- ASL7, ASL10, and Queue models
  - All perform well and do not result in "gridlock"

SUMO can be confusing to people that haven't seen it before, we may need to use Overcooked env
People that use Overcooked usually use PPO to train self-play agents
Overcooked is commonly used for ZSC
- Usually just 1 agent and 1 partner though
HIRO group overcooked env: https://github.com/HIRO-group/overcooked_ai
- Requires updates to support different kinds of experts
Could also consider human coordinator application
- Human injects actions/policy into the environment for a few steps (e.g. turn all lights green in 1 direction for a little bit)
- Would we still get the same performance?
- This would only be necessary to study if our other contributions don't seem like enough

Need to take self-play models and evaluate them in mixed coordination scenarios
- Randomly place agents in the environment and evaluate the performance of the middle agent
  - This agent is directly impacted by all others in the environment so it is most likely to perform poorly when changing the environment
  - There are many combinations (3^9) but we just need to evaluate a few
We need to start thinking about our algorithm
- FCP lets agent see 3 different scenarios (3 different levels of partner players)
- Our application uses partner players with different expertise
  - This is probably more related to population play
World Models application
- Joe to spend more time looking into this as well as the potential application on our experiments
- Will also look at implementation to see how challenging it would be to apply
  - Repo here: https://github.com/zacwellmer/WorldModels