Skip to content

04‐26‐2024 Weekly Tag Up

Joe Miceli edited this page Apr 26, 2024 · 1 revision

#Attendees

  • Joe
  • Chi Hui

Updates

  • Pefrormed first ablation study (exp 24) using the following psuedo-code image
  • Offline rollouts show rwd per step, online rollouts show rwd per 1000 steps
    • Surprising that they are so far off
      • This is likely because the value functions were learned using "ensemble policy"
      • Probably need to update to use homogenous policies

Next Steps

  • Re-run experiment 24 using avg speed limit 7 policy
    • Start using this policy now
  • If our goal is to show that FQE is working well, we should compare offline rollouts to online rollouts
  • But offline rollouts need to be performed with homogenous dataset
    • Learn G1_threshold using D_threshold, pi_threshold
      • Evaluate G1_threshold using pi_threshold in offline rollout
      • Compare to online rollout using pi_threshold
    • Learn G2_threshold using D_threshold, pi_threshold
      • Evaluate G2_threshold using pi_threshold in offline rollout
      • Compare to online rollout using pi_threshold
    • Learn G1_queue using D_queue, pi_queue
      • ...
Clone this wiki locally