nonRoman-OCR

Introduction to the basics of optical character recognition (OCR)––which allows for full-text searching and other types of text manipulation of a digitized document––with a particular focus on OCR for materials in languages other than English, and in scripts other than Roman/Latin. OCR is fairly commonplace for English and Roman-script languages like French or Spanish, but it does not work so seamlessly for languages such as Arabic, Hindi, or Chinese. This workshop will be an opportunity to explore an open source OCR tool (Tesseract) that has demonstrated success with some non-Roman scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
OCR for non-Roman Scripts.pdf		OCR for non-Roman Scripts.pdf
README.md		README.md
Tesseract Useful Links.pdf		Tesseract Useful Links.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nonRoman-OCR

About

Releases

Packages

License

AlMaturidiyya/nonRoman-OCR

Folders and files

Latest commit

History

Repository files navigation

nonRoman-OCR

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages