Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for 2024 #14

Open
3 of 6 tasks
robertknight opened this issue Jan 7, 2024 · 4 comments
Open
3 of 6 tasks

Roadmap for 2024 #14

robertknight opened this issue Jan 7, 2024 · 4 comments

Comments

@robertknight
Copy link
Owner

robertknight commented Jan 7, 2024

This issue exists to document what I think are the highest priorities in the short-medium term.

Models and training:

  • Document how to repeat the model training process (Add documentation for training models from scratch ocrs-models#6). A repeatable training process is IMO required for an ML project to really be considered open source. Also this is needed to enable fine tuning or training for new languages
  • Add benchmarks so the accuracy can be tracked over time
  • Expand the training data sets for detection and recognition to improve accuracy

ocrs library and CLI tool:

Beyond the short term list, here are some themes for subsequent work:

  • Continue expanding the datasets and test cases to improve accuracy
  • Use machine learning for layout analysis
  • Quantize the models to 8-bit to make the downloads smaller and execution faster
  • Improve WebAssembly execution performance
  • Add bindings for other languages (eg. C, Python, Node)

And some longer term things:

  • Support GPU inference. This will probably involve making the execution engine pluggable.
@robertknight robertknight changed the title Roadmap Roadmap for 2024 Jan 7, 2024
@robertknight robertknight pinned this issue Jan 7, 2024
@robertknight robertknight changed the title Roadmap for 2024 Roadmap for H1 2024 Jan 7, 2024
@zamazan4ik
Copy link

zamazan4ik commented Jan 7, 2024

Disclaimer: I spent several years of my life building production proprietary OCR solutions around tweaked Tesseract and a lot of custom image preprocessing before OCR + some text post-processing stuff. I want to add some ideas that you can find good enough to add to the roadmap.

Add benchmarks so the accuracy can be tracked over time

I suggest adding performance benchmarks too to the continuous monitoring. With Tesseract we had a lot of issues around performance. This question is especially critical if we talk about resource-limited devices like smartphones (I had exactly this case in my practice). Knowing actual performance in different cases can be valuable for the users. Comparison with other OCR solutions (like Tesseract) is also a good thing to know when you choose an OCR engine to use.

Another idea is extending prerecognition image processing. I see that you already do some image-processing stuff. Probably, you will find some useful prerecognition image processing algorithms in my repo: https://github.com/zamazan4ik/PRLib .

Quantize the models to 8-bit to make the downloads smaller and execution faster

About performance questions. Now I am researching Profile-Guided Optimization (PGO) usage to achieve better performance for different kinds of software. Maybe PGO can be useful for ocrs too - needs to be investigated, cannot say more right now without actual benchmarks. According to my tests, PGO already helps with optimizing in many real-life cases. However, if the current ocrs bottleneck is somewhere on the model inference side I do not expect huge wins from PGO in this case since such code usually is already well-optimized and PGO cannot do more.

@robertknight
Copy link
Owner Author

Hello, thanks for the input.

I suggest adding performance benchmarks too to the continuous monitoring. With Tesseract we had a lot of issues around performance.

I agree this would be useful. I suspect there may be problems with variability with the current GitHub Actions runners, as they are the free runners which don't provide any guarantees about isolation from other jobs.

Another idea is extending prerecognition image processing. I see that you already do some image-processing stuff. Probably, you will find some useful prerecognition image processing algorithms in my repo: https://github.com/zamazan4ik/PRLib .

Thanks for the link. One general goal of this project is to rely more on machine learning to handle noise and variability in inputs. So far I've found this to work well for things like handling low contrast or blurred input, but rotations need explicit handling.

However, if the current ocrs bottleneck is somewhere on the model inference side I do not expect huge wins from PGO in this case since such code usually is already well-optimized can PGO cannot do more.

I haven't tried PGO yet, so it would be interesting to see if it has an effect. From what I've seen in a few recent profiles using samply, most of the time spent is indeed concentrated in a few hot spots in model inference. If the input image is unnecessarily large (that is, far larger than is needed to read the text) that can also lead to a lot of time spent moving memory around for the decompressed image, see #15.

@robertknight robertknight changed the title Roadmap for H1 2024 Roadmap for 2024 Jun 3, 2024
@TannerRogalsky
Copy link

Hey, I just wanted to encourage you a little on this project. Since you posted on reddit about it 6 months ago I've used it in a few different contexts. I've pulled text out of digital comics, helped with rigging up a really janky testing setup for an emulated android app, and I used it to build a bot to, well, frankly to play an idle game for me.

This library is my go to for a quick and "good enough" OCR. It's fast enough in a lot of circumstances that I can use it even for real time feedback like in the idle game. And it's simple enough to set up that embedding it into another application is a matter of minutes.

Thanks, and I hope you continue to find fulfilling ways of improving this.

@robertknight
Copy link
Owner Author

Thanks @TannerRogalsky, I appreciate the kind words!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants