AlphaZero in Connect 4

An asynchronous implementation of the AlphaZero algorithm based on the AlphaZero paper.

AlphaZero is an algorithm that trains a reinforcement learning agent through self-play. The training examples are states of games, while the 'ground truth' labels are value of a state and policy (probability distribution of actions) of a state.

AlphaZero uses a modified version of the Monte Carlo Tree Search (MCTS) which uses the trained network to predict values of states rather than performing rollouts upon traversing to a leaf node.

Training

Training was done with a multiprocessing, asynchronous approach demonstrated here.

The agent was trained for 1 week, and was able to defeat the one-step-look-ahead agent consistently very quickly (at around 3000 epochs).

I then tested the agent against myself. While it was difficult to beat, it is not unbeatable, and as Connect4 is a solved game, this agent should theoretically be able to converge to an optimal policy. I then increased the memory buffer size and started training on it again. Future updates will be reported.

Codebase

The AlphaZero folder contains all of the backend code for this implementation.

The training configuration, ResNet built using tensorflow 2, memory object and game object can be found here.

MCTS related functions can be found here.

The Pit object for evaluating the agent against a one-step-look-ahead agent can be found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AlphaZero in Connect 4

Training

Codebase

Files

README.md

Latest commit

History

README.md

File metadata and controls

AlphaZero in Connect 4

Training

Codebase