Skip to content

11‐14‐2023 Weekly Tag Up

Joe Miceli edited this page Nov 15, 2023 · 1 revision

Attendees

  • Chi-hui
  • Joe

Updates

  • Ran 9-agent experiment image image image
  • Results are not as balanced as they were for 4-agent env
  • Some potential reasons:
    • We are using target network instead of policy iteration (policy iteration would use the entire dataset to update the value)
    • Decentralized (agents can't communicate)
      • Parameter sharing could help, sort of lets agents share knowledge
    • Learning rate is probably too high
      • Accumulated g1 and g2 rewards are larger in scale for 9-agent environment than 4-agent environment
      • Change lower bound to 0.01
      • Check this early on when running 16-agent experiment
  • Probably a good example of the challenges deploying offline method to simulation
  • If we get accepted, may want to start thinking about applying parameter sharing and other architectural decisions
  • Goal is to have 9 and 16 agent env complete in next 2 weeks
    • Thanksgiving next week
    • Response still expected at the end of the month
  • Look into "Diffusion Learning"
Clone this wiki locally