This repository contains a TicTacToe-Environment based on the OpenAI Gym module.
An example on how to use this environment with a Q-Learning algorithm that learns to play TicTacToe through self-play can be found here.
TicTacToe is a board game, where two players compete to place three stones of their color in parallel (horizontally or vertically) or diagonally to win the game.
Environment Id | Observation Space | Action Space | Reward Range | tStepL |
---|---|---|---|---|
TTT-v0 | Box(3,3) | Discrete(9) | (-inf, inf) | 9 |
git clone git@github.com:MauroLuzzatto/OpenAI-Gym-TicTacToe-Environment.git
cd OpenAI-Gym-TicTacToe-Environment
pip install -r requirements.txt
run the following command in the command line
pip install -e gym-TicTacToe/.
now you should see the following message
"Successfully installed gym-TicTacToe"
Further information on how to register an gym environment can be found here
import gym_TicTacToe
import gym
# initialize the tictactoe environment
env = gym.envs.make("TTT-v0", small=-1, large=10)
# start playing
color = 1
action = 0
new_state, reward, done, _ = env.step((action, color))
The environment contains the following four main methods:
- reset: reset the board game
- step: add a stone of a color of a player in the board
- decode_action: convert action from 0 to 9 into column and row values
- render: render the stones on the board game
The action space contains integers from 0 to 9, each representing a board field. The table below shows the action number and its corresponding board position.
0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | 8 |
State space:
- On a 3x3 board are theoretically 3^n^2 = 3^3^2 = 19’683 stone combinations of two different colors (and no color) possible (n = the size of the square filed)
- However, not all combinations are legal (e.g. you can have a board full of stones from one color)
- The state space can therefore be reduced to 8’953 states
State representation:
- The state is represented as a 3x3 numpy array representing the 9 fields of the board
- Dependent on the moves, the array will get the value - 0 for no stone, - 1 = stone of player 1 - 2 = stone of player 2
There are three different types of rewards in this environment:
- Large reward (env.large = 10) when the player wins (= +10)
- Small negative reward (env.small = -1) for every move played (= -1)
The game finishes:
- When one of the players has three stones either horizontally, vertically or diagonally
- When the board is full of stones, but there is no winner
- OpenAI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms from OpenAI
- Tic Tac Toe