-
Notifications
You must be signed in to change notification settings - Fork 0
03‐05‐2024 Weekly Tag Up
Joe Miceli edited this page Mar 5, 2024
·
1 revision
- Chi-Hui
- Joe
- Probably need to increase lambda learning rate
- 0.1 instead of 0.01
- This should be first
- Still would like to see mean policy get closer to constraint value (experiment 16.4)
- We could control this by penalizing the policy based on distance from constraint
- Still, below constraint is better than above so we would want to make sure changes to the reward account for this
- What are the important features that we missed from our last submission
- To make our paper more persuasive
- Probably need to run a real-world experiment
- Extend work performed by others to our work
- Add multi-objective to their work
- Implement their code and add our features/enhancements
- If we want it to be a "traffic control" paper
- Otherwise, if we want to be a RL paper, then we need to compare to some other baseline
- In this case, we will have to implement baselines to compare to our work
- Get baseline from "Batch Policy Learning Under Constraints" paper
- In this case, we will have to implement baselines to compare to our work
- Either direction will be a lot of work so we should choose based on what we are most confident in
- Still need to add ablation study
- Instead of learning lambda, what happens if we just randomly sample lambda?
- Some kind of consideration for constraint
- After we find best learning rate/reward combo we will have to run a few experiments
- Survey RL papers
- Survey traffic control papers
- Will compare to RL papers and decide if we want to focus on RL or traffic control paper