Any changes I can make to improve recognition of screenshots #662

ianpmurphy · 2024-02-06T13:54:10Z

ianpmurphy
Feb 6, 2024

I've written a small application which takes screenshots of the screen and OCRs parts of it to make decisions on when to send keystrokes.

If you are wondering why I don't use Windows Automation for this, I do, but it hasn't been suitable in some cases as it fails to give access to the text on some controls.

So, Tesseract works fairly well but fails to correctly recognise some texts in oddly simple situations. I'm reading label fields on windows apps in what should be pretty much perfect conditions - clear black text using a san serif font on a white or light grey background.

Problems I'm seeing when I pass it a bitmap with two words, one on top of another:

it will fail to read the second word even though its clear and well within the boundries of the bitmap, so 'Abced EFG' will return just 'Abced'
Letters which are upper case will be read as lower in unexpected ways, for example reading 'Abced EFG' as 'AbcedefG'
Line breaks are not passed back (I assume this is an option somewhere or other)
Line breaks are not passed through as spaces, so words get jammed together.

is there anything I should be reading which will help with this?

SarahAmagno · 2024-06-05T09:12:20Z

SarahAmagno
Jun 5, 2024

Hi :)

I think I had a similar issue, my texts were not recognized correctly, because the "words" in it were rather gibberish or unusual. As Tesseract "validates" the recognized words with LTSM (if I understood it right), it helped to disable the dictionary validation for scenarios, in which I expected no ordinary words to be recognized. Follow the instructions in this StackOverflow answer to set up a config: https://stackoverflow.com/a/26878952/10512697
or do

var engine = new TesseractEngine(dataPath, language, EngineMode.Default);
engine.SetVariable("load_system_dawg", false);
engine.SetVariable("load_freq_dawg", false);

Maybe this helps for your situation.
However, I am unsure about the line breaks.

3 replies

ianpmurphy Jun 5, 2024
Author

Thanks for that. I had only found some docs on lowering the strictness using the defultpagesegmode setting.

I'll give it a run and let you know if it improves things.

PeterExtrapreneur Sep 12, 2024

Did it do better?

ianpmurphy Sep 23, 2024
Author

Apologies, I completely forgot I said I would come back to you let you know.

Some things worked a lot better, but I'm still mystified at things it completely mis-reads, especially given that they're screen copies so the letters are crisp and clear. Luckily it doesn't particularly matter so long as its consistently incorrect. I do regex matching on the text scanned and my solution has been to put 'Expected text|' followed by whatever the ocr engine returns. I should dedicate some time to work out why some text is perfectly ocr'd and others are completely mis-read.

Falconne · 2024-10-26T10:12:17Z

Falconne
Oct 26, 2024

I'm doing something similar and having the same inconsistent results. I have attached a test image I'm using:

which is literally a black A on a white background and it's not recognised. I'm assuming it's because the library assumes I want to OCR a document and is excluding anything that doesn't look like a word? I tried the suggestion from @SarahAmagno above like so:

using var engine = new TesseractEngine(_tessDataPath, "eng", EngineMode.Default);
engine.SetVariable("load_system_dawg", false);
engine.SetVariable("load_freq_dawg", false);

but they didn't help. Is there anything I can do to make it just return the characters and numbers it sees?

2 replies

Falconne Oct 26, 2024

Ah, after Googling more, it looks like I can improve results by using different page segmentation modes based on UI control, I'll play with this some more.

Falconne Oct 27, 2024

Actually no, it's still quite bad. Even using PageSegMode.SingleChar, this image:

is recognised as the word "By" with 71% confidence. If I trim the bounding box more to exclude the few UI pixels in the corner it does recognise the "3" correctly. But I have other cases were I can't trim surrounding UI elements so easily and they make the detection wildly inaccurate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any changes I can make to improve recognition of screenshots #662

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Any changes I can make to improve recognition of screenshots #662

ianpmurphy Feb 6, 2024

Replies: 2 comments · 5 replies

SarahAmagno Jun 5, 2024

ianpmurphy Jun 5, 2024 Author

PeterExtrapreneur Sep 12, 2024

ianpmurphy Sep 23, 2024 Author

Falconne Oct 26, 2024

Falconne Oct 26, 2024

Falconne Oct 27, 2024

ianpmurphy
Feb 6, 2024

Replies: 2 comments 5 replies

SarahAmagno
Jun 5, 2024

ianpmurphy Jun 5, 2024
Author

ianpmurphy Sep 23, 2024
Author

Falconne
Oct 26, 2024