-
Notifications
You must be signed in to change notification settings - Fork 0
03‐29‐2024 Weekly Tag Up
Joe Miceli edited this page Mar 29, 2024
·
2 revisions
- Joe
- Chi Hui
- Bug identified and squashed
- Our updated "double threshold" single objective policy performs very similarly (according to g1) to the "queue" policy in some cases
- In other cases, the "queue" policy performs BETTER than the single objective policy (according to g1)
- This could mean that our new reward definition induces speed behavior similar to that of the queue policy
- Though the queue policy always performs MUCH BETTER than the double threshold policy according to g2
- We may have just needed to train the double threshold policies longer (though I doubt it because it seemed like we obtained convergence)
- This could mean that our new reward definition induces speed behavior similar to that of the queue policy
- Reran batch offline learning experiment with "queue" policy and "threshold 1.0/13.89" policy
- See Experiment 21.1
- Take out square root part of reward definition and regenerate table from experiment 20
- We need the queue policy to perform much worse than the excess speed policy