Mario Kart 64 Deep Learning Bot

big picture

The game environment of MarioKart 64 runs on the Bizhawk emulator which provides features for joypad support, recording functions of the gameplay as well as several debugging tools. Via Lua scripting it is possible to access and manipulate the game environment such that one can load and save the current state and subsequently take screenshots for learning abstract features of the environment.

In order to establish real time learning AI, we used a python server on which the lua environments connect to. The agent which acts on the environment is then solely implemented in python.

Paper

https://polybox.ethz.ch/index.php/s/XH9sq6kOYkKxPZG

Youtube Videos

Check out our youtube videos!

simple random agent

To see how things work in the big picture we provide the following example. Here is an example of an agent that executes random actions provided by the action space.

from mario_env import MarioEnv

"""
Random agent. Executes random actions provided by the action space
"""
def main():
    # num_steering_dir divides the continuous steering direction into discrete action space of steering directions. Should be a odd number.
    # if jump = true the bot has the ability to jump, too
    env = MarioEnv(num_steering_dir=11, jump=False)

    # returns all possible actions that can be executed on the environment
    action_space = env.action_space

    # sets the agent back to start and returns current printscreen
    state = env.reset()
    done = False
    while not done:
        # samples some action uniformly from action space
        new_action = action_space.sample()
        # executes the action on the environment
        # returns new printscreen, current reward and if were done, info is empty
        new_state, reward, done, info  = env.step(new_action)

    # closes the bizhawk client and the socket
    env.close()

if __name__ == '__main__':
    main()

To get started, you can have a look at the simple random action agent which executes random actions.

interaction with bizhawk

Based on the gym environment from https://github.com/openai/gym, each mario environment starts its own python server. Then, we start via some environment variables that point to the path of the bizhawk client. For each environment we establish a connection with the python servers. This results in a 1 to 1 socket between the mario environments on the python side and the Mario clients on the bizhawk side. Via this socket we have a primitive text based communication with a few commands e.g. "RESET, 0.1234:1, ..".

The whole communication is basically encapsulated into the class MarioConnection and the agent only interacts with the MarioEnvironment. The MarioEnv implements the proposed Gym methods from openai:

Reset -> State: Resets the environment and gets the initial status back.
Act(action) -> State,Reward,Done: Executes an action and gets the reward back and if we're done or not.
Close -> closes the environment. The rest of the provided methods are not relevant for us . In our case the state is defined as the current screenshot. The reward is some value in the range [0,1].

MarioConnection encapsulates the byte communication between bizhawk and python.

lua concerning side nodes

Since bizhawk can't currently accept some number to indicate which python server to connect. We had to create a script for each specific port to connect. This is the reason for the numerous new_mario_env0.lua, new_mario_env1.lua etc. Since everything starts automatically.

bizhawk side notes

When bizhawk is loaded it loads the state saved on state 2. http://tasvideos.org/Bizhawk/SavestateFormat.html You can save a state in-game via: shift + F2 on 2. It is advised to save the state at the beginning of the track such that the agent can directly start to cruise.

mkdl

mkdl holds all the python code. Short summary:

random_agent.py: Example agent
start_bizhawk.py : responsible for starting bizhawk automatically
utils.py: some util functions
run_bizhawk.py: starts bizhawk from python console
policy.py: holds our used NeuralNetworks. Mostly used is OurCNN2 which is a CNN with 2 additional non-linear relu layers.
mario_env.py: holds the MarioEnv which the agent can act on. And also the MarioConnection which communicates with the Bizhawk
ppo2_agent.py: executes the ppo agent. openai ppo implementation
a2c_agent.py
a3c_agent.py
just some additional agents we tried out and rejected again for our problem.

Prerequisites

windows only (thanks to bizhawk)
python 3.6

Installation

Install python 3.6.4 (https://www.python.org/downloads/release/python-364/)
install some relevant modules:

pip install gym --no-dependencies
pip install tensorflow (cpu version) / pip install tensorflow-gpu (gpu version)
pip install bench
pip install logger
pip install cloudpickle
install openai baselines but our extended ones: pip install --no-dependencies git+https://github.com/SimiPro/baselines.git

Install Microsoft MPI https://www.microsoft.com/en-us/download/details.aspx?id=55494
Install Bizhawk https://github.com/TASVideos/BizHawk/releases
Set the system environment variables:

BIZHAWK = "Path/To/BizHawkEmulator"
MKDL_LUA = "Path/To/MKDL_LUA/Folder"

Create folder "isos" in the Bizhawk Emulator folder and place the rom file ('mario.z64') in the folder

OPTIONAL:

Install Pycharm https://www.jetbrains.com/pycharm/download/#section=windows
Open Pycharm and select the python interpreter we just installed in 1
Install package gym under File -> Project:mkdl -> Project Interpreter -> Green Plus symbol -> Add gym

Running

Before running environment we have to create a save state which is loaded when bizhawk is started. This is normally at the start line of a track.

Run the Bizhawk emulator and select the rom mario.z64
Choose Timetrial mode, choose a Player, choose a Kart and Track (e.g. Luigi Raceaway)
Press Shift + F1 to create a save state at slot 2, close the application
Run python random_agent.py

Run the agent with pre trained weights

How to use PPO2_agent.py

To run the agent with precomputed weights:

ppo2_agent.py, Line 103 change to run(..)
Go to site-packages -> baselines -> PPO2 -> ppo2.py -> Line 153 Make sure load_path = "Path\To\Model\Model_no"

The weights can be downloaded here and are thought to be used with the track of luigi raceaway: https://polybox.ethz.ch/index.php/s/FUgZ7cWQ7BVd2Kv

Train the agent

ppo2_agent.py, Line 103 change to train(..)
Go to site-packages -> baselines -> PPO2 -> ppo2.py -> Line 153 Make sure load_path = None

Contributing

Simon Huber
Huy Cao Tri Do
Samuel Andermatt
Quentin Auzepy

License

This project is licensed under the MIT License

Acknowledgments

big thanks to https://github.com/rameshvarun/NeuralKart which was basically our inspiration for doing this.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
lua		lua
mkdl		mkdl
.gitignore		.gitignore
README.md		README.md
run_bizhawk.py		run_bizhawk.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mario Kart 64 Deep Learning Bot

big picture

Paper

Youtube Videos

simple random agent

interaction with bizhawk

lua concerning side nodes

bizhawk side notes

mkdl

Prerequisites

Installation

Running

Run the agent with pre trained weights

Train the agent

Contributing

License

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

SimiPro/mkdl

Folders and files

Latest commit

History

Repository files navigation

Mario Kart 64 Deep Learning Bot

big picture

Paper

Youtube Videos

simple random agent

interaction with bizhawk

lua concerning side nodes

bizhawk side notes

mkdl

Prerequisites

Installation

Running

Run the agent with pre trained weights

Train the agent

Contributing

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages