Policy-based Deep Reinforcement Learning

This repository contains the solution to the third assignment of the course Reinforcement Learning from Leiden Univeristy. It contains a framework for experimenting with different policy-based reinforcement learning techniques on OpenAI's Gym environments. In our specific case we succesfully applied REINFORCE and Actor-Critic algorithms (in various configurations) to the "CartPole-V1" environment. For a detailed description of the methodologies used and the experiments carried out, please refer to the full report.

Authors

Dimitrios Ieronymakis, Jonathan Collu and Riccardo Majellaro

Requirements

To run the available scripts, a Python 3 environment is required, together with the packages specified in the requirements.txt file, in the main directory. In order to install the requirements, run the following command on the Python 3 environment:

pip install -r requirements.txt

How to train all the configurations

All the experiments presented in the report are fully reproducible by running the following command from the main folder of the repository:

./experiment/basic_exps.sh

Remember to change the script permissions in order to make it executable as a program.

Note: Windows users can convert the above file to a .bat by simply removing the shebang (first line), comments, converting "\" to "^" and deleting ";".

How to run a configuration (training)

python experiment.py

from the main directory, along with the following possible arguments:

-run_name: name of your choice for the configuration.
-device: where to execute the computations (e.g. "cpu" or "cuda").
-optimizer: choose an optimizer between "adam", "sgd" and "rms" for the policy net.
-optim_lr: learning rate of the optimizer.
-optimizer_v: choose an optimizer between "adam", "sgd" and "rms" for the value net.
-optim_lr_v: learning rate of the optimizer_v.
-alg: choose between "reinforce" and "AC_bootstrap".
-epochs: number of epochs (i.e. updates).
-traces: number of traces per epoch (averaged in a single update).
-trace_len: length of a trace.
-n: number of steps for bootstrapping.
-gamma: discount factor.
-baseline: to use baseline subtraction.
-entropy: to use entropy regularization.
-entropy_factor: entropy regularization factor.
-use_es: set to 0 or 1 to use evolutionary strategies as described in the report.

How to evaluate a configuration

Run the command below from the main directory

python evaluate.py

along with the following arguments:

-run_name, name of your choice for the configuration.
-render, to visualize the environment.
-device, to indicate where to execute the computations (e.g. "cpu" or "cuda").

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Policy-based Deep Reinforcement Learning

Authors

Requirements

How to train all the configurations

How to run a configuration (training)

How to evaluate a configuration

Files

README.md

Latest commit

History

README.md

File metadata and controls

Policy-based Deep Reinforcement Learning

Authors

Requirements

How to train all the configurations

How to run a configuration (training)

How to evaluate a configuration