-
Notifications
You must be signed in to change notification settings - Fork 0
03‐12‐2024 Weekly Tag Up
Joe Miceli edited this page Mar 13, 2024
·
2 revisions
- Chi Hui
- Joe
- Bug discovered and detailed in Experiment 17 on google sheet
- The calculation of g1 our algorithm was not updated to match the new "double bounded" reward for the Excess Speed model
- This impacts dataset generation, using FQE to learn the G1 value function, offline rollouts, and online rollouts
- After fixing the bug, it looks like performance has flipped for the g1 metric
- Queue policy has lower g1 return (i.e. cars speeding & moving too slow) than the excess speed policy
- We need to find what is causing this
- Potentially in CalculateMaxSpeed function
- Potentially need to change the lower bound of the speed threshold (from 5.0 to 1.0 for example)
- It could just be an issue plotting
- The calculation of g1 our algorithm was not updated to match the new "double bounded" reward for the Excess Speed model
- We will probably make the paper "RL focused"
- This is probably easier than making it a "traffic control focused" paper because that would require running experiments with real data
- Which would require computing resources beyond a personal laptop
- To do this, we need to compare against a baseline and include another environment (or two)
- Joe will look into new environments (probably PettingZoo)
- Baseline could be "Regularized LSPI"
- We also need to show how our algorithm does when approximating the true constraint
- Refer to C_hat(pi) in original Batch Policy Learning Under Constraints paper
- True value will be determined from online rollouts
- Also want to show that parameter sharing helps the approximation
- This is probably easier than making it a "traffic control focused" paper because that would require running experiments with real data
- Need to read more papers
- Constrained RL
- Parameter sharing
- Joe traveling next week, no meeting
- May meet at end of next week/weekend