Skip to content

03‐12‐2024 Weekly Tag Up

Joe Miceli edited this page Mar 13, 2024 · 2 revisions

Attendees

  • Chi Hui
  • Joe

Updates

  • Bug discovered and detailed in Experiment 17 on google sheet
    • The calculation of g1 our algorithm was not updated to match the new "double bounded" reward for the Excess Speed model
      • This impacts dataset generation, using FQE to learn the G1 value function, offline rollouts, and online rollouts
    • After fixing the bug, it looks like performance has flipped for the g1 metric
      • Queue policy has lower g1 return (i.e. cars speeding & moving too slow) than the excess speed policy
      • We need to find what is causing this
      • Potentially in CalculateMaxSpeed function
      • Potentially need to change the lower bound of the speed threshold (from 5.0 to 1.0 for example)
      • It could just be an issue plotting

Next Steps

  • We will probably make the paper "RL focused"
    • This is probably easier than making it a "traffic control focused" paper because that would require running experiments with real data
      • Which would require computing resources beyond a personal laptop
    • To do this, we need to compare against a baseline and include another environment (or two)
      • Joe will look into new environments (probably PettingZoo)
      • Baseline could be "Regularized LSPI"
    • We also need to show how our algorithm does when approximating the true constraint
      • Refer to C_hat(pi) in original Batch Policy Learning Under Constraints paper
      • True value will be determined from online rollouts
      • Also want to show that parameter sharing helps the approximation
  • Need to read more papers
    • Constrained RL
    • Parameter sharing
  • Joe traveling next week, no meeting
    • May meet at end of next week/weekend
Clone this wiki locally