Introduction to the basics of optical character recognition (OCR)––which allows for full-text searching and other types of text manipulation of a digitized document––with a particular focus on OCR for materials in languages other than English, and in scripts other than Roman/Latin. OCR is fairly commonplace for English and Roman-script languages like French or Spanish, but it does not work so seamlessly for languages such as Arabic, Hindi, or Chinese. This workshop will be an opportunity to explore an open source OCR tool (Tesseract) that has demonstrated success with some non-Roman scripts.
-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction to the basics of optical character recognition (OCR)––which allows for full-text searching and other types of text manipulation of a digitized document––with a particular focus on OCR for materials in languages other than English, and in scripts other than Roman/Latin.
License
AlMaturidiyya/nonRoman-OCR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Introduction to the basics of optical character recognition (OCR)––which allows for full-text searching and other types of text manipulation of a digitized document––with a particular focus on OCR for materials in languages other than English, and in scripts other than Roman/Latin.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published