Skip to content

Synthetic handwritten Groningen Meaning Bank (GMB) dataset for research on full page text and entity recognition

Notifications You must be signed in to change notification settings

omni-us/research-dataset-sGMB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic handwritten Groningen Meaning Bank (GMB) dataset

Dataset of synthetically generated handwritten pages intended for research on full page text and entity recognition. The data was generated using the tool in https://github.com/manucarbonell/handwritten-document-synthesizer and data taken from https://gmb.let.rug.nl/.

This dataset was developed for the following paper. If you use this dataset in your research, please cite the origin of the data The Groningen Meaning Bank and cite the paper:

Manuel Carbonell, Alicia Fornés, Mauricio Villegas, and Josep Lladós. "A
neural model for text localization, transcription and named entity
recognition in full pages." Pattern Recognition Letters 136 (2020): 219-227.

Use nw-page-editor to visualize the xmls. To get a nicer visualization of the annotated entities load the css included in this repository as follows: nw-page-editor --css code/nw-page-editor-entities.css data.

About

Synthetic handwritten Groningen Meaning Bank (GMB) dataset for research on full page text and entity recognition

Resources

Stars

Watchers

Forks

Languages