Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The issue encountered when add the Thai language. #18

Open
kawaiipeace opened this issue Jun 17, 2024 · 2 comments
Open

The issue encountered when add the Thai language. #18

kawaiipeace opened this issue Jun 17, 2024 · 2 comments

Comments

@kawaiipeace
Copy link

I have issue when I add the Thai language to Scribe OCR as follows:

  1. I just add the tha.traineddata.gz to \tess\lang but the Console log show "Error: Tesseract (legacy) engine requested, but components are not present in ./tha.traineddata!! Failed loading language 'tha'".
  2. Sometime, I encountered "'Error opening data file tessdata/tha.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'tha' Tesseract couldn't load any languages! Could not initialize tesseract.'".

Do you have the instruction or plan for adding the Thai language to the Scribe OCR?

@Balearica
Copy link
Contributor

Thai should eventually be supported, however there are a couple of issues that make this more difficult compared to other languages.

First, the error message you pasted indicates we are not getting the correct language data from Tesseract.js. This is a good catch, and should be patched. I opened a Git Issue in that repo, and will fix at some point.
naptha/tesseract.js#931

Second, adding Thai involves adding a new script. Adding new languages that use Latin script is trivial, as English, Spanish, French, etc. all use the same font files. However, adding Thai will involve adding additional font resources, as well as code to load and switch between them.

All of the above is completely doable, however is more involved than simply adding Thai to the list of languages in the UI.

@kawaiipeace
Copy link
Author

Dear [Balearica]
Thank you very much.

@Balearica Balearica transferred this issue from scribeocr/scribeocr Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants