The issue encountered when add the Thai language. #18

kawaiipeace · 2024-06-17T10:36:41Z

I have issue when I add the Thai language to Scribe OCR as follows:

I just add the tha.traineddata.gz to \tess\lang but the Console log show "Error: Tesseract (legacy) engine requested, but components are not present in ./tha.traineddata!! Failed loading language 'tha'".
Sometime, I encountered "'Error opening data file tessdata/tha.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'tha' Tesseract couldn't load any languages! Could not initialize tesseract.'".

Do you have the instruction or plan for adding the Thai language to the Scribe OCR?

Balearica · 2024-06-17T18:06:27Z

Thai should eventually be supported, however there are a couple of issues that make this more difficult compared to other languages.

First, the error message you pasted indicates we are not getting the correct language data from Tesseract.js. This is a good catch, and should be patched. I opened a Git Issue in that repo, and will fix at some point.
naptha/tesseract.js#931

Second, adding Thai involves adding a new script. Adding new languages that use Latin script is trivial, as English, Spanish, French, etc. all use the same font files. However, adding Thai will involve adding additional font resources, as well as code to load and switch between them.

All of the above is completely doable, however is more involved than simply adding Thai to the list of languages in the UI.

kawaiipeace · 2024-06-18T02:45:50Z

Dear [Balearica]
Thank you very much.

kawaiipeace closed this as completed Jun 18, 2024

kawaiipeace reopened this Jun 18, 2024

Balearica transferred this issue from scribeocr/scribeocr Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The issue encountered when add the Thai language. #18

The issue encountered when add the Thai language. #18

kawaiipeace commented Jun 17, 2024

Balearica commented Jun 17, 2024

kawaiipeace commented Jun 18, 2024

The issue encountered when add the Thai language. #18

The issue encountered when add the Thai language. #18

Comments

kawaiipeace commented Jun 17, 2024

Balearica commented Jun 17, 2024

kawaiipeace commented Jun 18, 2024