04‐26‐2024 Weekly Tag Up

Jump to bottom

Joe Miceli edited this page Apr 26, 2024 · 1 revision

#Attendees

Joe
Chi Hui

Updates

Pefrormed first ablation study (exp 24) using the following psuedo-code
Offline rollouts show rwd per step, online rollouts show rwd per 1000 steps
- Surprising that they are so far off
  - This is likely because the value functions were learned using "ensemble policy"
  - Probably need to update to use homogenous policies

Next Steps

Re-run experiment 24 using avg speed limit 7 policy
- Start using this policy now
If our goal is to show that FQE is working well, we should compare offline rollouts to online rollouts
But offline rollouts need to be performed with homogenous dataset
- Learn G1_threshold using D_threshold, pi_threshold
  - Evaluate G1_threshold using pi_threshold in offline rollout
  - Compare to online rollout using pi_threshold
- Learn G2_threshold using D_threshold, pi_threshold
  - Evaluate G2_threshold using pi_threshold in offline rollout
  - Compare to online rollout using pi_threshold
- Learn G1_queue using D_queue, pi_queue
  - ...