Skip to content

01‐25‐2024 Constraint Ratio Discussion

Joe Miceli edited this page Jan 26, 2024 · 1 revision

Attendees

  • Chihui
  • Joe

Objective

  • Discuss recent experiments varying the constraint ratio and determine desired behavior and path forward

Notes

  • physical meaning of Cg1:
    • Average of the total g1 return multiplied by a ratio factor
      • This is used as the baseline for datapoints before they are normalized again (think of this as the new "0")
    • If a datapoint is already below constraint, then don't worry about it
      • Only really need to worry about the datapoints that are above the threshold
    • We already apply a threshold to the reward for the single-objective excessive speed model but this constraint is for the multi-objective learning
      • In multi-objective learning, sometimes possible to find the maximum reward for multiple objectives but this is not always the case

Constraint of 0.25

image

Constraint of 0.5

image

Constraint of 0.0

image

  • In all cases, it looks like mean policy is slightly below 3000 so the constraint is not having an impact on performance like we intend
  • We probably need to get the constraint (red line) to be around 2500

Next steps

  • Test various constraint ratios to try to get constraint around 2500
  • Run an experiment with that new ratio
Clone this wiki locally