03‐12‐2024 Weekly Tag Up

Jump to bottom

Joe Miceli edited this page Mar 13, 2024 · 2 revisions

Attendees

Chi Hui
Joe

Updates

Bug discovered and detailed in Experiment 17 on google sheet
- The calculation of g1 our algorithm was not updated to match the new "double bounded" reward for the Excess Speed model
  - This impacts dataset generation, using FQE to learn the G1 value function, offline rollouts, and online rollouts
- After fixing the bug, it looks like performance has flipped for the g1 metric
  - Queue policy has lower g1 return (i.e. cars speeding & moving too slow) than the excess speed policy
  - We need to find what is causing this
  - Potentially in CalculateMaxSpeed function
  - Potentially need to change the lower bound of the speed threshold (from 5.0 to 1.0 for example)
  - It could just be an issue plotting

Next Steps

We will probably make the paper "RL focused"
- This is probably easier than making it a "traffic control focused" paper because that would require running experiments with real data
  - Which would require computing resources beyond a personal laptop
- To do this, we need to compare against a baseline and include another environment (or two)
  - Joe will look into new environments (probably PettingZoo)
  - Baseline could be "Regularized LSPI"
- We also need to show how our algorithm does when approximating the true constraint
  - Refer to C_hat(pi) in original Batch Policy Learning Under Constraints paper
  - True value will be determined from online rollouts
  - Also want to show that parameter sharing helps the approximation
Need to read more papers
- Constrained RL
- Parameter sharing
Joe traveling next week, no meeting
- May meet at end of next week/weekend