subimg_augmentation

This repository contains code for extracting polygonal segmentation data from ALTO XML files to use in subimage augmentation, as presented in "Evaluating Augmented Training Data for Complex Document Layouts: the Case of Arabic Scientific Manuscripts" (DH2024). The code is available both as a Python script (extract-regions.py) and a Jupyter notebook.

The method for creating artificial images using these extracted regions is the choice of the user. A sample workflow that combines together select regions using a SegmOnto ontology will soon be uploaded to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
imgs_extracted		imgs_extracted
input		input
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
extract-regions.py		extract-regions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

subimg_augmentation

About

Releases

Packages

Languages

License

cmroughan/subimg_augmentation

Folders and files

Latest commit

History

Repository files navigation

subimg_augmentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages