02‐16‐2024 Constraint Setting with New Single Objective Models

Attendees

Updating Normalize Dataset function
Change the calculation of c1 to reflect rollout averages from excessive speed and queue length policies
Upper bound U1 is avg reward per step from excessive speed policy
- When evaluating excessive speed policy according to excessive speed reward
Lower bound L1 is avg reward per step from queue length policy
- When evaluating queue length policy according to excessive speed reward
Constraint c1 is now (U1 - L1) * ratio + L1
Leave rest of the algorithm the same