02‐06‐2024 Weekly Tag Up

Jump to bottom

Joe Miceli edited this page Feb 7, 2024 · 1 revision

Attendees

Joe
Chi-Hui

Updates

Changing reward didn't have much impact on excess speed model
We probably need to update reward to keep max speed above a certain level as well
- Punish agents for letting max speed go to 0
- Applying lower bound of 5.0 for now
Deployment Model
- 2 single objective models trained with PS --> 2 different models
- 9 agents means there are 2^9 deployment possibilities
- Still try to minimize dual objectives
- This would change the problem to a "deployment" problem
- We will keep this in mind for future, there could be some interesting applications to our work
  - Maybe look into changing deployment during dataset generation for batch offline learning
  - Or changing deployment for learning the policy

Next Steps

Retrain single objective max speed model to evaluate how the performance changed with the new reward
- Are intersections still going to 0 throughput?
Review old logs to see if policies were learned that stopped all cars in intersection
- Or is it an issue that was introduced with parameter sharing
Look at data from excess speed policy with other thresholds
- Did the threshold make a difference on number of stopped cars or not?