This code repository contains the implementations of the paper MANGO: A Mask Attention Guided One-Stage Scene Text Spotter (AAAI 2021).
Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT, ICDAR2019_MLT
The formatted training datalist can be found in demo/text_spotting/datalist
1.Firstly, download the pre-trained model, which was well trained on SynthText and SynthText_Curve).
2.Modified the paths (ann_file
, img_prefix
, work_dir
, etc..) in the config files demo/text_spotting/mango/config/mango_r50_ete_finetune_ic13.py
.
3.Run the following bash command in the command line
>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/
>>> bash dist_train.sh
Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add
--no-validate
command.
If you want to re-implement the model's performance from scratch, please following these steps:
1.Firstly, pre-train the attention module using the SynthText containing character-level annotations. See demo/text_spotting/mango/configs/mango_r50_att_pretrain.py
for more details.
2.Secondly, end-to-end training using the SynthText and SynthCurve containing only word-level annotations. See demo/text_spotting/mango/configs/mango_r50_ete_pretrain.py
for more details.
Notice:At the beginning of training, attention module and recognition module are trained together to prevent attention module from collapsing. The pretrained model is provided as mentioned above.
3.Thirdly, Fine-tune model on the mixed real dataset (include:ICADR2013~2019, Total-Text). See demo/text_spotting/mango/configs/mango_r50_ete_finetune_ic13.py
for more details.
4.Finally, Fine-tune on the ICDAR2013, ICDAR2015 and Total-Text separately for testing and evaluation.
Notice:Fine-tune on the ICDAR2015 with num_gird=60, and on the ICDAR2013 and Total-Text with num_grid=40
We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint
, lexicon_type
, etc..) in the testing script, and start testing:
>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> bash test_ic13.sh
The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/
.
We provide a script to visualize the intermediate output results of the model, include visualization results of segmentation, activated grid map, text pred and attention map. You can modify the paths (test_dataset
, config_file
, etc..) in the script, and start generating
visualization results:
>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> python vis.py
Some visualization results are shown:
All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.
Results on various datasets and trained models download:
Pipeline | Pretrained-Dataset | Links |
resnet50+fpn+CMA+lstm | SynthText SynthCurve |
|
resnet101+fpn+CMA+lstm | SynthText SynthCurve |
Dataset | Backbone | Pretrained | Mix-Finetune | Specific-Finetune | Test Scale | End-to-End | Word Spotting | Links | ||||
General | Weak | Strong | General | Weak | Strong | |||||||
ICDAR2013 (Reported) |
ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT COCO-Text Total-Text |
None | L-1440 | 86.9 | 90.0 | 90.5 | 90.1 | 94.1 | 94.8 | - |
ICDAR2013 | ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
None | L-1440 | 84.9 | 88.6 | 89.5 | 88.4 | 92.7 | 93.7 | |
ICDAR2013 | ResNet-101 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
None | L-1440 | 88 | 90.3 | 90.4 | 90.7 | 93.8 | 94.0 | |
ICDAR2015 (Reported) | ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT COCO-Text Total-Text |
ICDAR2015 | L-1800 | 67.3 | 78.9 | 81.8 | 70.3 | 83.1 | 86.4 | - |
ICDAR2015 | ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
ICDAR2015 | L-1800 | 70.8 | 77.4 | 80.7 | 73.8 | 81.1 | 85 | |
ICDAR2015 | ResNet-101 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
ICDAR2015 | L-1800 | 72.8 | 79.8 | 82.4 | 75.7 | 83.4 | 86.6 |
Dataset | Backbone | Pretrained | Mix-Finetune | Specific-Finetune | Test Scale | End-to-End | Word Spotting | Links | ||
None | Full | None | Full | |||||||
Total-Text (Reported) | ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT COCO-Text Total-Text |
Total-Text | L-1600 | - | - | 72.9 | 83.6 | - |
Total-Text | ResNet-50 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
Total-Text | L-1600 | 68.9 | 78.9 | 71.7 | 82.7 | |
Total-Text | ResNet-101 | SynthText SynthCurve |
ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text |
Total-Text | L-1600 | 70.2 | 79.9 | 73 | 83.9 |
If you find this repository is helpful to your research, please feel free to cite us:
@inproceedings{qiao2021mango,
title={MANGO: A Mask Attention Guided One-Stage Scene Text Spotter},
author={Qiao, Liang and Chen, Ying and Cheng, Zhanzhan and Xu, Yunlu and Niu, Yi and Pu, Shiliang and Wu, Fei},
booktitle={Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI)},
pages={2467-2476},
year={2021}
}
This project is released under the Apache 2.0 license
If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.