Deep Q-network for learning to play 2048

Intro

The RL Agent consists of 16 state variables that denote the value of all the tiles at each position of the 4x4 grid.
There are 4 moves that the agent can chose from : (swipe up,swipe down, swipe left, swipe right)
Rewarding system is based on the count of each tile value after a move.
Epsilon is reduced manually based on performance

Positional Independence - The tile's position in no way alters the state. The rewarding system only considers the total count of each value (0,2,4,8 etc ) in the grid and not their position.
Exponential Rewarding - The rewarding is assumed to be exponential. For example ,creating a 2048 tile is exponentially better than creating a 256 tile.
Penalizing Idling - The game adds no tile if the move made does not alter the postion of the tiles. Thus the agent might be stuck idling. This barrier in progress of the game is avoided by penalizing the agent while reaching such states.
Reporting - The performance of the agent is visualized by counting the maximum tile value reached and the number of idle moves made in each game. The graph for the same is presented

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
css		css
external2		external2
js		js
lib		lib
Log.txt		Log.txt
_gitignore		_gitignore
composer.json		composer.json
index.html		index.html
index.php		index.php
readme.md		readme.md
report.html		report.html