Wide and Deep Reinforcement Learning (WDRL) implementation in Pac-man using our new algorithm Wide Deep Q-networks (WDQN). You can download the paper here.
Wide and Deep Reinforcement Learning search to combine both the linear function (wide component) and the non-linear function approximation (deep component). For our research, we extended the popular Deep Q-Network (DQN) algorithm by mixing the linear combination of features with the convolutional neural networks as illustrated below. In this way, developing the WDQN algorithm.
To run the models in order to replicate the best results of the study you can call the runModels.sh
file.
An example of it:
$ python3 pacman.py -p PacmanWDQN -n 101 -x 1 -g DirectionalGhost -l mediumClassic --path WDQN-mediumC-3feat --fixRandomSeed 6311
Where the program runs on the medium map, evaluating the model for 100 games, using the directional ghosts (follows Pac-man actively) and the WDQN algoritm with 3 features. In addition, the fix random seed "6111" is used to guarantee replication. To see how this random seed was selected open the file createRandomSeed.py
.
To replicate the training results of the study, you can call the runExperiments.sh
file. Please follow the instruction of this file.
Run a model on the smallClassic
layout for 6000 episodes, of which 5000 episodes
are used for training. In addition, the directional ghosts are used instead of the random ghosts
$ python3 pacman.py -p PacmanWDQN -n 6000 -x 5000 -g DirectionalGhost -l smallClassic --path WDQN-smallC-feat3
The hyperparameters of PacmannWDQN_agents.py
are in the json
file WDQN-smallC-feat3
in the directory dict/smallC
.
Different layouts can be found and created in the layouts
directory.
The hyperparameters used for training the agents can be found in the dict
folder as json
files. A complete description of this parameters can be found in the README
file of the folder.
In the parameters
folder you can find the additional values saved as npy
files to properly load the python parameters of the model (e.g. for loading and starting in a certain training step).
Models are saved as "checkpoint" files in the model
directory.
Load and save filenames can be set on the hyperparameters .
In addition, the memory replay (or experience replay) for each agent is saved in the directory data
, guarantying to continue training the model in the exact same state.
Please cite this repository if it was useful for your research:
@inproceedings{icaart19,
author={Juan M. Montoya. and Christian Borgelt.},
title={Wide and Deep Reinforcement Learning for Grid-based Action Games},
booktitle={Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2019},
pages={50-59},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007313200500059},
isbn={978-989-758-350-6},
}
python==3.5.1
tensorflow==0.8rc
DQN implementation of Pac-man by Tycho van der Ouderaa:
Pac-man implementation by UC Berkeley:
Wide & Deep Learning by Heng-Tze Cheng et al.: