02‐27‐2024 Weekly Tag Up

Attendees

New rate-based lambda updater implemented
Reran experiments with constraint ratio of 0.25 and 0.75
In both cases, the constraint was obeyed but would be better if we were able to get mean policy closer to the constraint (i.e. behave more like queue policy)
Rerun experiments with different learning rate
- Rate of 0.1
- May also need to try 0.01
THEN run experiments with a new dataset
- So more % actions come from queue length model
- We want the mean policy to provide returns closer to the constraint
Still need to consider other methods
- For a submission, we will need to provide some comparison to other methods
Need to think about how lambda impacts the return of the mean/current policies
- It's challenging to connect them conceptually
- If we can't come up with a connection, we may have to come up with a different method of updating lambda