-
Notifications
You must be signed in to change notification settings - Fork 0
04‐26‐2024 Weekly Tag Up
Joe Miceli edited this page Apr 26, 2024
·
1 revision
#Attendees
- Joe
- Chi Hui
- Pefrormed first ablation study (exp 24) using the following psuedo-code
- Offline rollouts show rwd per step, online rollouts show rwd per 1000 steps
- Surprising that they are so far off
- This is likely because the value functions were learned using "ensemble policy"
- Probably need to update to use homogenous policies
- Surprising that they are so far off
- Re-run experiment 24 using avg speed limit 7 policy
- Start using this policy now
- If our goal is to show that FQE is working well, we should compare offline rollouts to online rollouts
- But offline rollouts need to be performed with homogenous dataset
- Learn G1_threshold using D_threshold, pi_threshold
- Evaluate G1_threshold using pi_threshold in offline rollout
- Compare to online rollout using pi_threshold
- Learn G2_threshold using D_threshold, pi_threshold
- Evaluate G2_threshold using pi_threshold in offline rollout
- Compare to online rollout using pi_threshold
- Learn G1_queue using D_queue, pi_queue
- ...
- Learn G1_threshold using D_threshold, pi_threshold