Roadmap for 2024 #14

robertknight · 2024-01-07T10:00:48Z

This issue exists to document what I think are the highest priorities in the short-medium term.

Models and training:

Document how to repeat the model training process (Add documentation for training models from scratch ocrs-models#6). A repeatable training process is IMO required for an ML project to really be considered open source. Also this is needed to enable fine tuning or training for new languages
Add benchmarks so the accuracy can be tracked over time
Expand the training data sets for detection and recognition to improve accuracy

ocrs library and CLI tool:

Add the infrastructure to support multiple languages and model updates (Support for non-Latin characters #8, Make it easy to update models to latest version #4)
Add end-to-end tests that actually check the output. There is a simple end-to-end test but it only verifies that the CLI tool can be built and runs, not the actual output (Actually check E2E test output, add more context to output errors #25)
Improve runtime performance and efficiency

Beyond the short term list, here are some themes for subsequent work:

Continue expanding the datasets and test cases to improve accuracy
Use machine learning for layout analysis
Quantize the models to 8-bit to make the downloads smaller and execution faster
Improve WebAssembly execution performance
Add bindings for other languages (eg. C, Python, Node)

And some longer term things:

Support GPU inference. This will probably involve making the execution engine pluggable.

zamazan4ik · 2024-01-07T23:17:12Z

Disclaimer: I spent several years of my life building production proprietary OCR solutions around tweaked Tesseract and a lot of custom image preprocessing before OCR + some text post-processing stuff. I want to add some ideas that you can find good enough to add to the roadmap.

Add benchmarks so the accuracy can be tracked over time

I suggest adding performance benchmarks too to the continuous monitoring. With Tesseract we had a lot of issues around performance. This question is especially critical if we talk about resource-limited devices like smartphones (I had exactly this case in my practice). Knowing actual performance in different cases can be valuable for the users. Comparison with other OCR solutions (like Tesseract) is also a good thing to know when you choose an OCR engine to use.

Another idea is extending prerecognition image processing. I see that you already do some image-processing stuff. Probably, you will find some useful prerecognition image processing algorithms in my repo: https://github.com/zamazan4ik/PRLib .

Quantize the models to 8-bit to make the downloads smaller and execution faster

About performance questions. Now I am researching Profile-Guided Optimization (PGO) usage to achieve better performance for different kinds of software. Maybe PGO can be useful for ocrs too - needs to be investigated, cannot say more right now without actual benchmarks. According to my tests, PGO already helps with optimizing in many real-life cases. However, if the current ocrs bottleneck is somewhere on the model inference side I do not expect huge wins from PGO in this case since such code usually is already well-optimized and PGO cannot do more.

robertknight · 2024-01-08T08:14:43Z

Hello, thanks for the input.

I suggest adding performance benchmarks too to the continuous monitoring. With Tesseract we had a lot of issues around performance.

I agree this would be useful. I suspect there may be problems with variability with the current GitHub Actions runners, as they are the free runners which don't provide any guarantees about isolation from other jobs.

Another idea is extending prerecognition image processing. I see that you already do some image-processing stuff. Probably, you will find some useful prerecognition image processing algorithms in my repo: https://github.com/zamazan4ik/PRLib .

Thanks for the link. One general goal of this project is to rely more on machine learning to handle noise and variability in inputs. So far I've found this to work well for things like handling low contrast or blurred input, but rotations need explicit handling.

However, if the current ocrs bottleneck is somewhere on the model inference side I do not expect huge wins from PGO in this case since such code usually is already well-optimized can PGO cannot do more.

I haven't tried PGO yet, so it would be interesting to see if it has an effect. From what I've seen in a few recent profiles using samply, most of the time spent is indeed concentrated in a few hot spots in model inference. If the input image is unnecessarily large (that is, far larger than is needed to read the text) that can also lead to a lot of time spent moving memory around for the decompressed image, see #15.

TannerRogalsky · 2024-07-06T11:54:35Z

Hey, I just wanted to encourage you a little on this project. Since you posted on reddit about it 6 months ago I've used it in a few different contexts. I've pulled text out of digital comics, helped with rigging up a really janky testing setup for an emulated android app, and I used it to build a bot to, well, frankly to play an idle game for me.

This library is my go to for a quick and "good enough" OCR. It's fast enough in a lot of circumstances that I can use it even for real time feedback like in the idle game. And it's simple enough to set up that embedding it into another application is a matter of minutes.

Thanks, and I hope you continue to find fulfilling ways of improving this.

robertknight · 2024-07-06T20:46:11Z

Thanks @TannerRogalsky, I appreciate the kind words!

robertknight changed the title ~~Roadmap~~ Roadmap for 2024 Jan 7, 2024

robertknight pinned this issue Jan 7, 2024

robertknight changed the title ~~Roadmap for 2024~~ Roadmap for H1 2024 Jan 7, 2024

robertknight mentioned this issue Mar 17, 2024

Languages support #42

Closed

rth mentioned this issue Mar 30, 2024

Add evaluation benchmarks #43

Open

robertknight changed the title ~~Roadmap for H1 2024~~ Roadmap for 2024 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for 2024 #14

Roadmap for 2024 #14

robertknight commented Jan 7, 2024 •

edited

Loading

zamazan4ik commented Jan 7, 2024 •

edited

Loading

robertknight commented Jan 8, 2024

TannerRogalsky commented Jul 6, 2024

robertknight commented Jul 6, 2024

Roadmap for 2024 #14

Roadmap for 2024 #14

Comments

robertknight commented Jan 7, 2024 • edited Loading

zamazan4ik commented Jan 7, 2024 • edited Loading

robertknight commented Jan 8, 2024

TannerRogalsky commented Jul 6, 2024

robertknight commented Jul 6, 2024

robertknight commented Jan 7, 2024 •

edited

Loading

zamazan4ik commented Jan 7, 2024 •

edited

Loading