Skip to content

Commit

Permalink
{README} Rewrite + add photos
Browse files Browse the repository at this point in the history
  • Loading branch information
starcolon committed Mar 30, 2016
1 parent cfcbb9a commit c35b351
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 27 deletions.
84 changes: 60 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,44 @@
# Q-EXP

[Reinforcement Learning](http://www.cs.indiana.edu/~gasser/Salsa/rl.html)
library for Node.js app. This implements Q-learning
technique for exploration-exploitation mechanic.
with Q-learning technique for Node.js app.
It also provides policy generalisation in-built.

---

## Installation

```bash
$ npm install q-exp
```

---

## Usage

To include `q-exp` library to your **Node.js** app:
>To include `q-exp` library to your **Node.js** app:
```javascript
var qexp = require('q-exp');
```

Read the instructions all the way down to learn how to use.

## Implementation
---

This library is purely written in [ECMAScript 6](https://github.com/lukehoban/es6features) and runs
with [Node.js](https://nodejs.org/en/). The implementation paradigm
of the project tends towards functional programming in order to
keep operations as simple and highly separable as possible.
## Features included

### Entire pipeline inspired by Promise
>To make reinforcement learning works end-to-end,
we implement and include the following features.

Promise plays a great role in pipelining all sequential
and also parallel processings in this library. Every single
canonical operation of Q-EXP takes its arguments and returns
the Promise. This makes life easier for pipelining the sequential
processes or operations for readability and manageability and,
foremostly, side-effect-free paradigm.
- Q-learning
- Exploration-exploitation
- Generalisation with Gradient descent
- Sample usages (Tic-tac-toe and falling-stones)

## Sample pipeline of operations
---

## Usage

To create an agent, load its learned policy from a physical file,
then let it choose an action which it *believes* it would
Expand All @@ -59,23 +61,25 @@ agent.then(ql.setState(initialState)) // Let the agent know the state

```

## Sample
---

## Sample #1 - Tic tac toe

![Tictactoe](/media/ss-tictactoe.png)

A quick sample implementation is a classic [tic-tac-toe game](https://en.wikipedia.org/wiki/Tic-tac-toe), source code available at
[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js).
[/sample/tictactoe.js](https://github.com/starcolon/q-exp/blob/master/sample/tictactoe.js). This sample does not make use of generalisation,
just a plain exploration-exploitation.

#### To play with the trained bot:
#### To play against the trained tic-tac-toe bot:

```
$ cd sample
$ node tictactoe.js play
```

By Q-learning definition, the bot doesn't know the rule of
the game. Yet, it knows which moves may probably lead to victory
and which moves may likely introduce defeat. The bot may sometimes
fail to end the game by an ultimate move because it doesn't know
the rule. And such ultimate pattern may not be learned by itself.
>After having your agent intensively trained for thousands games,
you'll eventually find out how strong your bot has become.


#### To train the bot
Expand All @@ -85,6 +89,38 @@ the rule. And such ultimate pattern may not be learned by itself.
$ ./train-tictactoe
```

---

## Sample #2 - Falling stones

![Falling stones](/media/ss-falling-stones.png)

Another classic game where two stones are falling from the
top edge of the screen at random position. The player are forced
to move left or right to escape from those falling stones.
If a stone fall onto the player, the game is over.

This sample makes use of `generalisation` so it can
survive longer even you train it for just ten or twenty games.

To run it:

```bash
$ cd sample
$ node falling-stones.js
```

#### Benchmark

After generalisation, the agent can survive slightly longer.
However we just fit the reward space with linear plane
which might not well fit critical cases. It doesn't
guarantee convergence.

Y axis represents the number of moves it survives in a game.

![Benchmark](/media/falling-stones-benchmark.png)


## Licence

Expand Down
Binary file added media/falling-stones-benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/ss-falling-stones.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/ss-tictactoe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions sample/falling-stones.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ const actionSet = [ // Character movement
];

const BOARD_SIZE = 7;
const MAX_LESSONS = 5;
const MAX_LESSONS = 40;

/* 5x5 0
┏━━━━━┓
Expand Down Expand Up @@ -187,10 +187,10 @@ function repeatMove(agent,nLessons,history){

console.log('=============================='.cyan)
console.log(' Before generalisation:'.cyan);
console.log(historyBeforeGenl);
console.log(historyBeforeGenl.join(','));
console.log('');
console.log(' After generalisation:'.cyan);
console.log(historyAfterGenl);
console.log(historyAfterGenl.join(','));
return _agent;
}
else return generalize(_agent,history);
Expand Down

0 comments on commit c35b351

Please sign in to comment.