diff --git a/README.md b/README.md index fa10749..5c91a7a 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ # Q-EXP [Reinforcement Learning](http://www.cs.indiana.edu/~gasser/Salsa/rl.html) -library for Node.js app. This implements Q-learning -technique for exploration-exploitation mechanic. +with Q-learning technique for Node.js app. +It also provides policy generalisation in-built. + +--- ## Installation @@ -10,9 +12,11 @@ technique for exploration-exploitation mechanic. $ npm install q-exp ``` +--- + ## Usage -To include `q-exp` library to your **Node.js** app: +>To include `q-exp` library to your **Node.js** app: ```javascript var qexp = require('q-exp'); @@ -20,23 +24,21 @@ var qexp = require('q-exp'); Read the instructions all the way down to learn how to use. -## Implementation +--- -This library is purely written in [ECMAScript 6](https://github.com/lukehoban/es6features) and runs -with [Node.js](https://nodejs.org/en/). The implementation paradigm -of the project tends towards functional programming in order to -keep operations as simple and highly separable as possible. +## Features included -### Entire pipeline inspired by Promise +>To make reinforcement learning works end-to-end, +we implement and include the following features. -Promise plays a great role in pipelining all sequential -and also parallel processings in this library. Every single -canonical operation of Q-EXP takes its arguments and returns -the Promise. This makes life easier for pipelining the sequential -processes or operations for readability and manageability and, -foremostly, side-effect-free paradigm. +- Q-learning +- Exploration-exploitation +- Generalisation with Gradient descent +- Sample usages (Tic-tac-toe and falling-stones) -## Sample pipeline of operations +--- + +## Usage To create an agent, load its learned policy from a physical file, then let it choose an action which it *believes* it would @@ -59,23 +61,25 @@ agent.then(ql.setState(initialState)) // Let the agent know the state ``` -## Sample +--- + +## Sample #1 - Tic tac toe + +![Tictactoe](/media/ss-tictactoe.png) A quick sample implementation is a classic [tic-tac-toe game](https://en.wikipedia.org/wiki/Tic-tac-toe), source code available at -[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js). +[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js). This sample does not make use of generalisation, +just a plain exploration-exploitation. -#### To play with the trained bot: +#### To play against the trained tic-tac-toe bot: ``` $ cd sample $ node tictactoe.js play ``` -By Q-learning definition, the bot doesn't know the rule of -the game. Yet, it knows which moves may probably lead to victory -and which moves may likely introduce defeat. The bot may sometimes -fail to end the game by an ultimate move because it doesn't know -the rule. And such ultimate pattern may not be learned by itself. +>After having your agent intensively trained for thousands games, +you'll eventually find out how strong your bot has become. #### To train the bot @@ -85,6 +89,38 @@ the rule. And such ultimate pattern may not be learned by itself. $ ./train-tictactoe ``` +--- + +## Sample #2 - Falling stones + +![Falling stones](/media/ss-falling-stones.png) + +Another classic game where two stones are falling from the +top edge of the screen at random position. The player are forced +to move left or right to escape from those falling stones. +If a stone fall onto the player, the game is over. + +This sample makes use of `generalisation` so it can +survive longer even you train it for just ten or twenty games. + +To run it: + +```bash + $ cd sample + $ node falling-stones.js +``` + +#### Benchmark + +After generalisation, the agent can survive slightly longer. +However we just fit the reward space with linear plane +which might not well fit critical cases. It doesn't +guarantee convergence. + +Y axis represents the number of moves it survives in a game. + +![Benchmark](/media/falling-stones-benchmark.png) + ## Licence diff --git a/media/falling-stones-benchmark.png b/media/falling-stones-benchmark.png new file mode 100644 index 0000000..2730fea Binary files /dev/null and b/media/falling-stones-benchmark.png differ diff --git a/media/ss-falling-stones.png b/media/ss-falling-stones.png new file mode 100644 index 0000000..685ac04 Binary files /dev/null and b/media/ss-falling-stones.png differ diff --git a/media/ss-tictactoe.png b/media/ss-tictactoe.png new file mode 100644 index 0000000..11bcf95 Binary files /dev/null and b/media/ss-tictactoe.png differ diff --git a/sample/falling-stones.js b/sample/falling-stones.js index c354a26..b520add 100644 --- a/sample/falling-stones.js +++ b/sample/falling-stones.js @@ -20,7 +20,7 @@ const actionSet = [ // Character movement ]; const BOARD_SIZE = 7; -const MAX_LESSONS = 5; +const MAX_LESSONS = 40; /* 5x5 0 ┏━━━━━┓ @@ -187,10 +187,10 @@ function repeatMove(agent,nLessons,history){ console.log('=============================='.cyan) console.log(' Before generalisation:'.cyan); - console.log(historyBeforeGenl); + console.log(historyBeforeGenl.join(',')); console.log(''); console.log(' After generalisation:'.cyan); - console.log(historyAfterGenl); + console.log(historyAfterGenl.join(',')); return _agent; } else return generalize(_agent,history);