AlphaZero is a computer program developed by artificial intelligence research company DeepMind. The program is generalized to work on all two-player complete information games such as tic tac toe, four in a row, chess and go. It is only given the rules of the game, and learns to master the game solely by playing against itself.
We are a project group within the student organization Cogito at NTNU (Norwegian University of Science and Technology). The group of eight has worked on this through the fall of 2019.
This project was created to achieve a greater understanding of the workings of one of the hottest reinforcement learning algorithms, as well as to have fun in the process. Optimization and introduction of parallel training was motivated by the desire of being able to train a network in reasonable time. This is espacially important for games with large action- and state spaces, as convergence will take a long time. The visualization was fueled by the quest to see how the network is thinking.
Clone the repository
git clone https://github.com/CogitoNTNU/AlphaZero.git
Navigate into the project folder
cd AlphaZero
Install the dependencies
pip3 install -r requirements.txt
If everything went well, you can run the code
python3 Main.py
And you should then get a Tic Tac Toe game up
To visualise the tree-search, install graphviz from this external link: Graphviz
If you are using Windows, remember to add the graphviz/bin directory to PATH.
You should now see something like this:
To play against AlphaZero execute:
python3 play.py
There are command line arguments, that simplify changing between games, number of searches and opponent. Currently supported games are Tic Tac Toe and Four In A Row. Default values are TicTacToe, 500 and 10_3_3.h5 respectively.
python3 play.py --game FourInARow --numSearch 1000
To train AlphaZero run:
python3 train.py
Even though AlphaZero is an effective algorithm and has achieved impressive feats, it is infamous for its computation costs required for training. Speeding up training is essential for several reasons. First of all, faster training allows us to tune hyperparameters considerably faster; we calculated that it went from taking weeks to only taking a few days. Secondly, AlphaZero are now able to complete training in a couple of days instead of weeks.
There were two concepts that significantly sped up the training; parallelization and batching. We also used caching, which caused a minor speed improvement.
We had eight processes that generated games in parallel. Each of which played 400 games simultaneously, bathing up the Resnet predictions. Thus a total of 4000 games were played in parallel each epoch. All of this resulted in a speedup of about 16 times.
To play a version of AlphaZero against all others run
python3 play.py --game FourInARow --numSearch 1000
Having played only 3000 games of Tic Tac Toe againts itself, AZ was able to master the game. By only jusing raw predictrions from the resnet, it plays perfectly. After 100 000 self-played games on Four In A Row, the network is able to play at a decent level with 500 searches per move. more results coming soon
Create a folder for your game and implement the functions in Config (Contains information about board dimensions, and can convert between game actions and action numbers) and Gamelogic (Contains board and functions like execute a move, undo, reset, get legal moves, and see if someone has won).
- Pondering - Make the network think while it is the players turn.
- Virtual loss - Spread out threads on different lines while multithreading, to avoid duplicate work.
- Chess type clock - Add ability to play with a timer so that the AI can think as long as the human, as to simulate a real match.
- Transfer learning - Use a pre-trained resnet and only train the first and last few layers. This should speed up convergence significantly. The weights can for instance be taken from Leela Chess Zero.
- Match temperature - A temperature argument as to make the network not behave deterministicly when playing against a human (can be extended easily by the same approach used in training).
- The AlphaZero- and AphaGo Zero paper are essential to read to achieve a thorough understanding of the algorithm.
- For a brief walkthrough of the algorithm and a more "hands on approach", I recommend reading through this article on Medium about an implementation of the AlpaZero algorithm.
- To get a gentle introduction to the algorithm, this video by Two Minute Papers might be a nice place to start.
- David Silver also explains AlphaZero himself.
This project is licensed under the MIT License - see the LICENSE.md file for details.