Skip to content

03‐05‐2024 Weekly Tag Up

Joe Miceli edited this page Mar 5, 2024 · 1 revision

Attendees

  • Chi-Hui
  • Joe

Updates

  • Probably need to increase lambda learning rate
    • 0.1 instead of 0.01
    • This should be first
  • Still would like to see mean policy get closer to constraint value (experiment 16.4)
    • We could control this by penalizing the policy based on distance from constraint
    • Still, below constraint is better than above so we would want to make sure changes to the reward account for this
  • What are the important features that we missed from our last submission
    • To make our paper more persuasive
    • Probably need to run a real-world experiment
    • Extend work performed by others to our work
      • Add multi-objective to their work
      • Implement their code and add our features/enhancements
      • If we want it to be a "traffic control" paper
    • Otherwise, if we want to be a RL paper, then we need to compare to some other baseline
      • In this case, we will have to implement baselines to compare to our work
        • Get baseline from "Batch Policy Learning Under Constraints" paper
    • Either direction will be a lot of work so we should choose based on what we are most confident in
    • Still need to add ablation study
      • Instead of learning lambda, what happens if we just randomly sample lambda?
      • Some kind of consideration for constraint
  • After we find best learning rate/reward combo we will have to run a few experiments

For this week

  • Survey RL papers

For next week

  • Survey traffic control papers
  • Will compare to RL papers and decide if we want to focus on RL or traffic control paper
Clone this wiki locally