Hanabi

This project fails to find optimal strategy for Hanabi using a neural network and a Deep-Q-Learning approach.

Experiments

The notebooks in this repo highlight some problems in this implementation.

LearningRate shows how high learning rate prevents convergence
Inizialization shows how too big initialization values (i.e. greater than 1e-3 in absolute value) prevent convergence
QUnlimited shows how, without a compensation inside the code, for high gamma (i.e. > 0.5) Q grows exponentially
QFeedback shows how introducing a cutting for very high value predicted Q values saturate
despite the name GradientDeath shows how this algorithm is unable to learn. In a previous version the output layer had dimension 1 and, if the last hidden layer was small and with relu activation, gradient saturated very fast

The logic of the implementation is in the Game*.py files.

Game.py is the last version, used in all the notebooks except QUnlimited
GameUnlimited.py doesn't cut the returned values of Q function and is used for QUnlimited
GameDummy.py has lower dimensionality

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
GradientDeath		GradientDeath
Initialization		Initialization
LearningRate		LearningRate
QFeedback		QFeedback
QUnlimited		QUnlimited
doc		doc
.gitignore		.gitignore
Game.py		Game.py
GameDummy.py		GameDummy.py
GameUnlimited.py		GameUnlimited.py
GradientDeath.ipynb		GradientDeath.ipynb
Initialization.ipynb		Initialization.ipynb
LearningRate.ipynb		LearningRate.ipynb
QFeedback.ipynb		QFeedback.ipynb
QUnlimited.ipynb		QUnlimited.ipynb
README.md		README.md