11‐14‐2023 Weekly Tag Up

Attendees

Ran 9-agent experiment
Results are not as balanced as they were for 4-agent env
Some potential reasons:
- We are using target network instead of policy iteration (policy iteration would use the entire dataset to update the value)
- Decentralized (agents can't communicate)
  - Parameter sharing could help, sort of lets agents share knowledge
- Learning rate is probably too high
  - Accumulated g1 and g2 rewards are larger in scale for 9-agent environment than 4-agent environment
  - Change lower bound to 0.01
  - Check this early on when running 16-agent experiment
Probably a good example of the challenges deploying offline method to simulation
If we get accepted, may want to start thinking about applying parameter sharing and other architectural decisions
Goal is to have 9 and 16 agent env complete in next 2 weeks
- Thanksgiving next week
- Response still expected at the end of the month
Look into "Diffusion Learning"
- Anuj will present research on 11/28
- Video will be recorded
- Other reference: https://arxiv.org/pdf/2205.09991.pdf