Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize categories/steps #188

Open
kba opened this issue Dec 22, 2021 · 3 comments
Open

Reorganize categories/steps #188

kba opened this issue Dec 22, 2021 · 3 comments

Comments

@kba
Copy link
Member

kba commented Dec 22, 2021

Processor developers must specify categories and steps in the ocrd-tool.json. It would be useful to rethink this classification to make it easier to use them to have an additional means to find processors for certain tasks, besides https://ocr-d.de/en/workflows.

@bertsky
Copy link
Collaborator

bertsky commented Dec 22, 2021

...but also extend it where necessary, e.g. #159, or for language and script identification, or special region detection (only separator lines or tables or stamps or handwriting ...), or pure reading-order detection, or page classification, or import/export tasks.

@kba
Copy link
Member Author

kba commented Dec 22, 2021

Yes, prune the ambiguous parts (e.g. difference between layout/analysis and layout/segmentation) and add the missing parts. And probably use either categories or steps. And align with our glossary.

@bertsky
Copy link
Collaborator

bertsky commented Dec 22, 2021

difference between layout/analysis and

I always understood that as in logical document layout analysis, not optical page layout analysis.

use them to have an additional means to find processors for certain tasks

So basically all processors would have to be registered centrally during installation, right? (Which is also a system-side prerequisite to the discovery parts of the Web API.)

Perhaps we could write some ocrd ocrd-tool register (passing a tool JSON) and ocrd ocrd-tool find (passing a directory to recursively search for tool JSONs). These could be run by some additional pattern rule in ocrd_all, or during make install in the individual modules. They could both feed into a local DB, which could be queried via some ResourceManager-like API (ProcessorManager?) or even another CLI (ocrd processor find|lookup|...?)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants