-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap for 2024 #14
Comments
Disclaimer: I spent several years of my life building production proprietary OCR solutions around tweaked Tesseract and a lot of custom image preprocessing before OCR + some text post-processing stuff. I want to add some ideas that you can find good enough to add to the roadmap.
I suggest adding performance benchmarks too to the continuous monitoring. With Tesseract we had a lot of issues around performance. This question is especially critical if we talk about resource-limited devices like smartphones (I had exactly this case in my practice). Knowing actual performance in different cases can be valuable for the users. Comparison with other OCR solutions (like Tesseract) is also a good thing to know when you choose an OCR engine to use. Another idea is extending prerecognition image processing. I see that you already do some image-processing stuff. Probably, you will find some useful prerecognition image processing algorithms in my repo: https://github.com/zamazan4ik/PRLib .
About performance questions. Now I am researching Profile-Guided Optimization (PGO) usage to achieve better performance for different kinds of software. Maybe PGO can be useful for |
Hello, thanks for the input.
I agree this would be useful. I suspect there may be problems with variability with the current GitHub Actions runners, as they are the free runners which don't provide any guarantees about isolation from other jobs.
Thanks for the link. One general goal of this project is to rely more on machine learning to handle noise and variability in inputs. So far I've found this to work well for things like handling low contrast or blurred input, but rotations need explicit handling.
I haven't tried PGO yet, so it would be interesting to see if it has an effect. From what I've seen in a few recent profiles using samply, most of the time spent is indeed concentrated in a few hot spots in model inference. If the input image is unnecessarily large (that is, far larger than is needed to read the text) that can also lead to a lot of time spent moving memory around for the decompressed image, see #15. |
Hey, I just wanted to encourage you a little on this project. Since you posted on reddit about it 6 months ago I've used it in a few different contexts. I've pulled text out of digital comics, helped with rigging up a really janky testing setup for an emulated android app, and I used it to build a bot to, well, frankly to play an idle game for me. This library is my go to for a quick and "good enough" OCR. It's fast enough in a lot of circumstances that I can use it even for real time feedback like in the idle game. And it's simple enough to set up that embedding it into another application is a matter of minutes. Thanks, and I hope you continue to find fulfilling ways of improving this. |
Thanks @TannerRogalsky, I appreciate the kind words! |
This issue exists to document what I think are the highest priorities in the short-medium term.
Models and training:
ocrs library and CLI tool:
Beyond the short term list, here are some themes for subsequent work:
And some longer term things:
The text was updated successfully, but these errors were encountered: