Any changes I can make to improve recognition of screenshots #662
Replies: 2 comments 5 replies
-
Hi :) I think I had a similar issue, my texts were not recognized correctly, because the "words" in it were rather gibberish or unusual. As Tesseract "validates" the recognized words with LTSM (if I understood it right), it helped to disable the dictionary validation for scenarios, in which I expected no ordinary words to be recognized. Follow the instructions in this StackOverflow answer to set up a config: https://stackoverflow.com/a/26878952/10512697
Maybe this helps for your situation. |
Beta Was this translation helpful? Give feedback.
-
I'm doing something similar and having the same inconsistent results. I have attached a test image I'm using: which is literally a black A on a white background and it's not recognised. I'm assuming it's because the library assumes I want to OCR a document and is excluding anything that doesn't look like a word? I tried the suggestion from @SarahAmagno above like so:
but they didn't help. Is there anything I can do to make it just return the characters and numbers it sees? |
Beta Was this translation helpful? Give feedback.
-
I've written a small application which takes screenshots of the screen and OCRs parts of it to make decisions on when to send keystrokes.
If you are wondering why I don't use Windows Automation for this, I do, but it hasn't been suitable in some cases as it fails to give access to the text on some controls.
So, Tesseract works fairly well but fails to correctly recognise some texts in oddly simple situations. I'm reading label fields on windows apps in what should be pretty much perfect conditions - clear black text using a san serif font on a white or light grey background.
Problems I'm seeing when I pass it a bitmap with two words, one on top of another:
is there anything I should be reading which will help with this?
Beta Was this translation helpful? Give feedback.
All reactions