{README} Rewrite + add photos

tao-pr · Mar 30, 2016 · c35b351 · c35b351
1 parent cfcbb9a
commit c35b351
Show file tree

Hide file tree

Showing 5 changed files with 63 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -1,42 +1,44 @@
 # Q-EXP
 
 [Reinforcement Learning](http://www.cs.indiana.edu/~gasser/Salsa/rl.html) 
-library for Node.js app. This implements Q-learning 
-technique for exploration-exploitation mechanic.
+with Q-learning technique for Node.js app. 
+It also provides policy generalisation in-built.
+
+---
 
 ## Installation
 
 ```bash
 $ npm install q-exp
 ```
 
+---
+
 ## Usage
 
-To include `q-exp` library to your **Node.js** app:
+>To include `q-exp` library to your **Node.js** app:
 
 ```javascript
 var qexp = require('q-exp');
 ```
 
 Read the instructions all the way down to learn how to use.
 
-## Implementation
+---
 
-This library is purely written in [ECMAScript 6](https://github.com/lukehoban/es6features) and runs 
-with [Node.js](https://nodejs.org/en/). The implementation paradigm 
-of the project tends towards functional programming in order to 
-keep operations as simple and highly separable as possible. 
+## Features included
 
-### Entire pipeline inspired by Promise
+>To make reinforcement learning works end-to-end, 
+we implement and include the following features.
 
-Promise plays a great role in pipelining all sequential 
-and also parallel processings in this library. Every single 
-canonical operation of Q-EXP takes its arguments and returns 
-the Promise. This makes life easier for pipelining the sequential 
-processes or operations for readability and manageability and, 
-foremostly, side-effect-free paradigm.
+- Q-learning
+- Exploration-exploitation
+- Generalisation with Gradient descent
+- Sample usages (Tic-tac-toe and falling-stones)
 
-## Sample pipeline of operations
+---
+
+## Usage
 
 To create an agent, load its learned policy from a physical file, 
 then let it choose an action which it *believes* it would 
@@ -59,23 +61,25 @@ agent.then(ql.setState(initialState)) // Let the agent know the state
 
 ```
 
-## Sample
+---
+
+## Sample #1 - Tic tac toe
+
+![Tictactoe](/media/ss-tictactoe.png)
 
 A quick sample implementation is a classic [tic-tac-toe game](https://en.wikipedia.org/wiki/Tic-tac-toe), source code available at 
-[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js). 
+[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js). This sample does not make use of generalisation, 
+just a plain exploration-exploitation.
 
-#### To play with the trained bot:
+#### To play against the trained tic-tac-toe bot:
 
 ```
 	$ cd sample
 	$ node tictactoe.js play
 ```
 
-By Q-learning definition, the bot doesn't know the rule of 
-the game. Yet, it knows which moves may probably lead to victory 
-and which moves may likely introduce defeat. The bot may sometimes 
-fail to end the game by an ultimate move because it doesn't know 
-the rule. And such ultimate pattern may not be learned by itself.
+>After having your agent intensively trained for thousands games, 
+you'll eventually find out how strong your bot has become.
 
 
 #### To train the bot
@@ -85,6 +89,38 @@ the rule. And such ultimate pattern may not be learned by itself.
 	$ ./train-tictactoe
 ```
 
+---
+
+## Sample #2 - Falling stones
+
+![Falling stones](/media/ss-falling-stones.png)
+
+Another classic game where two stones are falling from the 
+top edge of the screen at random position. The player are forced 
+to move left or right to escape from those falling stones. 
+If a stone fall onto the player, the game is over.
+
+This sample makes use of `generalisation` so it can 
+survive longer even you train it for just ten or twenty games.
+
+To run it:
+
+```bash
+	$ cd sample
+	$ node falling-stones.js
+```
+
+#### Benchmark 
+
+After generalisation, the agent can survive slightly longer. 
+However we just fit the reward space with linear plane 
+which might not well fit critical cases. It doesn't 
+guarantee convergence.
+
+Y axis represents the number of moves it survives in a game.
+
+![Benchmark](/media/falling-stones-benchmark.png)
+
 
 ## Licence
 

diff --git a/media/falling-stones-benchmark.png b/media/falling-stones-benchmark.png
diff --git a/media/ss-falling-stones.png b/media/ss-falling-stones.png
diff --git a/media/ss-tictactoe.png b/media/ss-tictactoe.png
diff --git a/sample/falling-stones.js b/sample/falling-stones.js
@@ -20,7 +20,7 @@ const actionSet = [ // Character movement
 ];
 
 const BOARD_SIZE = 7;
-const MAX_LESSONS = 5;
+const MAX_LESSONS = 40;
 
 /* 5x5          0
 	┏━━━━━┓
@@ -187,10 +187,10 @@ function repeatMove(agent,nLessons,history){
 
 						console.log('=============================='.cyan)
 						console.log(' Before generalisation:'.cyan);
-						console.log(historyBeforeGenl);
+						console.log(historyBeforeGenl.join(','));
 						console.log('');
 						console.log(' After generalisation:'.cyan);
-						console.log(historyAfterGenl);
+						console.log(historyAfterGenl.join(','));
 						return _agent;
 					}
 					else return generalize(_agent,history);