Recognize Magic: The Gathering (MTG) cards in images by detecting and recognizing their names.
Visions is a Python 3 and C++ program for recognizing Magic: The Gathering cards in images. It currently recognizes cards with black text in the name (modern and M15 frames).
The recognition is done by first detecting and then reading the card names in the image. The process can be split in five phases:
- FASText points are used to detect connected components that could be parts of a card's name.
- A convolutional neural network classifies the found connected components as parts of card names or noise.
- DBSCAN clustering connects found components to form text lines.
- A recurrent neural network containing an LSTM unit is used to read the detected text.
- Language model using SymSpell matches raw results to card names to improve recognition results.
This program is a constructive part of my master's thesis. The thesis contains full details of the implementation in Finnish. A more detailed explanation in English might be added later as a GitHub page or in some other format.
Datasets used for training the neural networks are published in visions-datasets.
Since Python 3 can be too slow for calculation intensive operations, FASText and DBSCAN are implemented in C++. The following installation instructions have been tested on a fresh installation of Ubuntu 18.04 LTS.
- Cmake for compilation automation. I installed Cmake 3.17.3 from source code, but any version starting from 3.12 should work.
- A C++ compiler supporting C++11 standard. I used g++ 7.5.0, which was simply installed with
sudo apt install g++
. - Python 3.6 for running the Python code. It's already included in Ubuntu 18.04.
- Install dependencies:
sudo apt install python3-dev
sudo apt install python3-numpy
- Compile the code:
mkdir build
cd build
cmake ../
cmake --build ./
If the compilation was successful, there should now be a file called libftpy.so
in the py
folder of the project.
- Create a virtual environment and activate it:
sudo apt install python3-venv
python3 -m venv env
source env/bin/activate
- Upgrade pip and setuptools
pip install --upgrade pip
pip install --upgrade setuptools
- Install depedencies using
requirements.txt
file
pip install -r requirements.txt
Now that the installation is complete, you can use py/main.py
to process images from the inputs
folder. Run
python py/main.py
to process the sample inputs.
You can generate visual outputs similar to the images on top of this page by adding a --visualize 1
command line argument:
python py/main.py --visualize 1
The generated images can be found in a folder called outputs
.
Argument | Purpose | Default | Example |
---|---|---|---|
input |
Specify input image folder | "inputs" | python py/main --input my_input_folder |
visualize |
Toggle on visualization output | 0 | python py/main --visualize 1 |
output |
Specify visualization output folder | "outputs" | python py/main --visualize 1 --output my_output_folder |
- The most glaring issue of the implementation is that it only recognizes cards with black text in names. White text cards is the first improvement to be done. It should be possible to add the white text cards, by expanding the training data and tweaking the options of FASText.
- There's no unit tests. Unit tests would be useful for at least FASText code.
- The detection pipeline doesn't detect text lines that are heavily rotated. The pipeline is also a bit complex. Alternative detection methods should be tested.