-
Notifications
You must be signed in to change notification settings - Fork 0
02‐27‐2024 Weekly Tag Up
Joe Miceli edited this page Feb 27, 2024
·
1 revision
- Chi-Hui
- Joe
- New rate-based lambda updater implemented
- Reran experiments with constraint ratio of 0.25 and 0.75
- In both cases, the constraint was obeyed but would be better if we were able to get mean policy closer to the constraint (i.e. behave more like queue policy)
- Rerun experiments with different learning rate
- Rate of 0.1
- May also need to try 0.01
- THEN run experiments with a new dataset
- So more % actions come from queue length model
- We want the mean policy to provide returns closer to the constraint
- Still need to consider other methods
- For a submission, we will need to provide some comparison to other methods
- Need to think about how lambda impacts the return of the mean/current policies
- It's challenging to connect them conceptually
- If we can't come up with a connection, we may have to come up with a different method of updating lambda