Skip to content

03‐29‐2024 Weekly Tag Up

Joe Miceli edited this page Mar 29, 2024 · 2 revisions

Attendees

  • Joe
  • Chi Hui

Status

  • Bug identified and squashed
    • Our updated "double threshold" single objective policy performs very similarly (according to g1) to the "queue" policy in some cases
    • In other cases, the "queue" policy performs BETTER than the single objective policy (according to g1)
      • This could mean that our new reward definition induces speed behavior similar to that of the queue policy
        • Though the queue policy always performs MUCH BETTER than the double threshold policy according to g2
      • We may have just needed to train the double threshold policies longer (though I doubt it because it seemed like we obtained convergence)
  • Reran batch offline learning experiment with "queue" policy and "threshold 1.0/13.89" policy
    • See Experiment 21.1

Next Steps

  • Take out square root part of reward definition and regenerate table from experiment 20
    • We need the queue policy to perform much worse than the excess speed policy
Clone this wiki locally