For documentation regarding the tasks used in these experiments refer to the following repository: RL-Continuing-Tasks
python search.py [options]
For documentation on algorithm parameters refer to Thesis
Option | Description | Default |
---|---|---|
--num_processes |
Number of asynchronous agents | 16 |
--steps |
Total number of steps distributed across all synchronous agents in millions | 16 |
--algorithm |
Algorithm: Q (for Q-Learning) or SARSA (for SARSA) |
Q |
--network |
Network specification: linear or deep (Architecture may depend on task. See Networks.py for detailed architecture) |
linear |
--reward |
Type of Reward: discounted for discounted returns or average for average rewards |
discounted |
--task |
The task ID: 1 , 2 or 3 |
1 |
--lr |
Learning Rate | 0.0001 |
--beta |
Beta: Used to calculate average reward when reward option is average |
0.001 |
--df |
Discount Factor: Used to weight future rewards when reward option is discounted |
0.99 |
The command above will generate logs in the following directory: ./Code/logs/{algorithm}/{reward}/{network}
The directory will contain:
- A log file for each asynchronous agent containing the reward and total avereage reward at each step
- The network parameters saved for each million steps (cumulated over all agents)
- A
hyper_params
file specifying:- The parameters chosen for the run
- The Average reward for each saved network (over 5 sample runs of 50,000 steps)
After training the agent, it may be useful to observe its behavior in the environment.
To do so use python visualize.py [options]
Option | Description | Default |
---|---|---|
--task |
The task ID: 1 , 2 or 3 |
1 |
--network |
Network specification: linear or deep |
linear |
--param |
Network parameters path |
Note: Vizalusation uses Matplotlib with Qt5Agg backend. Some issues have been identidied on some platforms