Skip to content

Benchmark for VG-based Detection and Chart Understanding (IEEE T-PAMI 2024)

License

Notifications You must be signed in to change notification settings

Vill-Lab/2024-TPAMI-VGDCU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

🔥 [TPAMI 2024] Benchmark for VG-based Detection and Chart Understanding (VG-DCU)

📜 Introduction

Differece

Rendering vector graphics into pixel arrays can result in significant memory costs or loss of information, as shown in above Figure 1. We propose the first large-scale chart-based vector graphics dataset focusing on VG-based Detection and Chart Understanding.

Task Dataset Type Source # Chart Type # Nums
ChartQA DVQA RG Synthetic 1 300,000
FigureQA RG Synthetic 5 100,000
PlotQA RG Synthetic 3 224,377
LEAF-QA RG Synthetic 4 250,000
Chart-to-Table ICPR 2020 RG Synthetic & Real 15 40,322
ICPR 2022 RG Real 15 36,183
VG Detection (YOLaT used) SESYD-Floorplans VG Synthetic - 1,000
SESYD-Diagrams VG Synthetic - 1,000
VG Detection & Chart-to-Table VG-DCU(Ours) VG Synthetic & Real 16 15,197

The currently available public vector graphics datasets are limited to the two small datasets indicated in the Table and lack the complexity necessary for the advancement of vector image detection. In contrast, our proposed dataset comprises over 10,000 vector charts utilizing diverse primitives with rich attributes.

Dataset Construction

Differece

The proposed VG-based chart dataset contains two subsets:
  • Vega-Lite: a synthetic subset generated with scripts and fictional data

  • Plotly:a real-world subset drawn by users.

Dataset Statistic

Dataset Split: We collect 10,682 synthetic and 4,515 real charts in the VG-DCU dataset, by default using 80% as the training set and 20% as the test set. We divide the training and test set so that objects from the same category are included in both the training and testing set.

Vega-Lite Plotly
Chart Type Train Test Train Test
Area 0 0 87 22
Bar (Vert. & Hor.) 2,400 601 1,704 426
Box (Vert. & Hor.) 0 0 482 121
Donut&Pie 2,395 599 200 50
Line 3,749 938 146 37
Scatter 0 0 640 161
Heatmap 0 0 154 39
Counter 0 0 180 45
Violin 0 0 82 21
Sankey 0 0 21 6
Total 8,544 2,138 3,609 906

Dataset Analysis

(a) The distribution map of categories and number of bbox instances. (b) The width-to-height ratio distribution of class and box instances

Differece

Download

We will provide both Baidu Drive and Google Drive for downloading.

Citation

BibTex:

@inproceedings{yolat24,
title={{Hierarchical Recognizing Vector Graphics and A New Chart-based Vector Graphics Dataset}},
author={Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, Cairong Zhao},
booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={},
number={},
pages={},
year={2024}}

Please do consider 🌟 star our project to share with your community if you find this repository helpful!

Related Project

YOLaT-VectorGraphicsRecognition

About

Benchmark for VG-based Detection and Chart Understanding (IEEE T-PAMI 2024)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published