-
Notifications
You must be signed in to change notification settings - Fork 0
11‐14‐2023 Weekly Tag Up
Joe Miceli edited this page Nov 15, 2023
·
1 revision
- Chi-hui
- Joe
- Ran 9-agent experiment
- Results are not as balanced as they were for 4-agent env
- Some potential reasons:
- We are using target network instead of policy iteration (policy iteration would use the entire dataset to update the value)
- Decentralized (agents can't communicate)
- Parameter sharing could help, sort of lets agents share knowledge
- Learning rate is probably too high
- Accumulated g1 and g2 rewards are larger in scale for 9-agent environment than 4-agent environment
- Change lower bound to 0.01
- Check this early on when running 16-agent experiment
- Probably a good example of the challenges deploying offline method to simulation
- If we get accepted, may want to start thinking about applying parameter sharing and other architectural decisions
- Goal is to have 9 and 16 agent env complete in next 2 weeks
- Thanksgiving next week
- Response still expected at the end of the month
- Look into "Diffusion Learning"
- Anuj will present research on 11/28
- Video will be recorded
- Other reference: https://arxiv.org/pdf/2205.09991.pdf