glyph-to-character

The aim is to build parser that can help parse PDF's that have non-standard unicode mapping - a common problem in PDFs containing texts in Indian languages. If glyphs of unique chracters can be extracted, then they can be used to get the correct unicode by querying a VLM with an appropriate prompt. Gemini performs well on this.

Requirements

The notebook is ready to run. Personal Gemini API Key is required which needs to be updated in config.py.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
config.py		config.py
pdf_glyph_to_character.ipynb		pdf_glyph_to_character.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

glyph-to-character

Requirements

About

Releases

Packages

Languages

mriya98/glyph-to-character

Folders and files

Latest commit

History

Repository files navigation

glyph-to-character

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages