01‐25‐2024 Constraint Ratio Discussion

Jump to bottom

Joe Miceli edited this page Jan 26, 2024 · 1 revision

Attendees

Chihui
Joe

Objective

Discuss recent experiments varying the constraint ratio and determine desired behavior and path forward

Notes

physical meaning of Cg1:
- Average of the total g1 return multiplied by a ratio factor
  - This is used as the baseline for datapoints before they are normalized again (think of this as the new "0")
- If a datapoint is already below constraint, then don't worry about it
  - Only really need to worry about the datapoints that are above the threshold
- We already apply a threshold to the reward for the single-objective excessive speed model but this constraint is for the multi-objective learning
  - In multi-objective learning, sometimes possible to find the maximum reward for multiple objectives but this is not always the case

Constraint of 0.25

Constraint of 0.5

Constraint of 0.0

In all cases, it looks like mean policy is slightly below 3000 so the constraint is not having an impact on performance like we intend
We probably need to get the constraint (red line) to be around 2500

Next steps

Test various constraint ratios to try to get constraint around 2500
Run an experiment with that new ratio