-
Notifications
You must be signed in to change notification settings - Fork 0
01‐25‐2024 Constraint Ratio Discussion
Joe Miceli edited this page Jan 26, 2024
·
1 revision
- Chihui
- Joe
- Discuss recent experiments varying the constraint ratio and determine desired behavior and path forward
- physical meaning of Cg1:
- Average of the total g1 return multiplied by a ratio factor
- This is used as the baseline for datapoints before they are normalized again (think of this as the new "0")
- If a datapoint is already below constraint, then don't worry about it
- Only really need to worry about the datapoints that are above the threshold
- We already apply a threshold to the reward for the single-objective excessive speed model but this constraint is for the multi-objective learning
- In multi-objective learning, sometimes possible to find the maximum reward for multiple objectives but this is not always the case
- Average of the total g1 return multiplied by a ratio factor
- In all cases, it looks like mean policy is slightly below 3000 so the constraint is not having an impact on performance like we intend
- We probably need to get the constraint (red line) to be around 2500
- Test various constraint ratios to try to get constraint around 2500
- Run an experiment with that new ratio