OpenAI Gym – TicTacToe Environment

This repository contains a TicTacToe-Environment based on the OpenAI Gym module.

An example on how to use this environment with a Q-Learning algorithm that learns to play TicTacToe through self-play can be found here.

TicTacToe is a board game, where two players compete to place three stones of their color in parallel (horizontally or vertically) or diagonally to win the game.

Environment Id	Observation Space	Action Space	Reward Range	tStepL
TTT-v0	Box(3,3)	Discrete(9)	(-inf, inf)	9

Getting Started

1) Setup

git clone git@github.com:MauroLuzzatto/OpenAI-Gym-TicTacToe-Environment.git
cd OpenAI-Gym-TicTacToe-Environment
pip install -r requirements.txt

2) Register the Environment

run the following command in the command line

pip install -e gym-TicTacToe/.

now you should see the following message

"Successfully installed gym-TicTacToe"

Further information on how to register an gym environment can be found here

3) Use the environment and start playing

import gym_TicTacToe
import gym

# initialize the tictactoe environment
env = gym.envs.make("TTT-v0", small=-1, large=10)

# start playing
color = 1
action = 0
new_state, reward, done, _ = env.step((action, color))

TicTacToe Environment

Methods

The environment contains the following four main methods:

reset: reset the board game
step: add a stone of a color of a player in the board
decode_action: convert action from 0 to 9 into column and row values
render: render the stones on the board game

Actions

The action space contains integers from 0 to 9, each representing a board field. The table below shows the action number and its corresponding board position.


0	1	2
3	4	5
6	7	8

States

State space:

On a 3x3 board are theoretically 3^n^2 = 3^3^2 = 19’683 stone combinations of two different colors (and no color) possible (n = the size of the square filed)
However, not all combinations are legal (e.g. you can have a board full of stones from one color)
The state space can therefore be reduced to 8’953 states

State representation:

The state is represented as a 3x3 numpy array representing the 9 fields of the board
Dependent on the moves, the array will get the value - 0 for no stone, - 1 = stone of player 1 - 2 = stone of player 2

Rewards

There are three different types of rewards in this environment:

Large reward (env.large = 10) when the player wins (= +10)
Small negative reward (env.small = -1) for every move played (= -1)

Done

The game finishes:

When one of the players has three stones either horizontally, vertically or diagonally
When the board is full of stones, but there is no winner

References

OpenAI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms from OpenAI
Tic Tac Toe

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
gym-TicTacToe		gym-TicTacToe
images		images
.gitignore		.gitignore
README.md		README.md
example.ipynb		example.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Gym – TicTacToe Environment

Getting Started

1) Setup

2) Register the Environment

3) Use the environment and start playing

TicTacToe Environment

Methods

Actions

States

Rewards

Done

References

About

Releases

Contributors 2

Languages

MauroLuzzatto/OpenAI-Gym-TicTacToe-Environment

Folders and files

Latest commit

History

Repository files navigation

OpenAI Gym – TicTacToe Environment

Getting Started

1) Setup

2) Register the Environment

3) Use the environment and start playing

TicTacToe Environment

Methods

Actions

States

Rewards

Done

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Contributors 2

Languages