03‐05‐2024 Weekly Tag Up

Attendees

Probably need to increase lambda learning rate
- 0.1 instead of 0.01
- This should be first
Still would like to see mean policy get closer to constraint value (experiment 16.4)
- We could control this by penalizing the policy based on distance from constraint
- Still, below constraint is better than above so we would want to make sure changes to the reward account for this
What are the important features that we missed from our last submission
- To make our paper more persuasive
- Probably need to run a real-world experiment
  - https://traffic-signal-control.github.io/index.html#open-datasets
- Extend work performed by others to our work
  - Add multi-objective to their work
  - Implement their code and add our features/enhancements
  - If we want it to be a "traffic control" paper
- Otherwise, if we want to be a RL paper, then we need to compare to some other baseline
  - In this case, we will have to implement baselines to compare to our work
    - Get baseline from "Batch Policy Learning Under Constraints" paper
- Either direction will be a lot of work so we should choose based on what we are most confident in
- Still need to add ablation study
  - Instead of learning lambda, what happens if we just randomly sample lambda?
  - Some kind of consideration for constraint
After we find best learning rate/reward combo we will have to run a few experiments

Survey traffic control papers
Will compare to RL papers and decide if we want to focus on RL or traffic control paper