Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add qurator-spk/sbb_binarization #214

Merged
merged 3 commits into from
Oct 22, 2020
Merged

add qurator-spk/sbb_binarization #214

merged 3 commits into from
Oct 22, 2020

Conversation

kba
Copy link
Member

@kba kba commented Oct 22, 2020

Pixelwise binarization with selectional auto-encoders in Keras

@jbarth-ubhd has an example of the result in OCR-D/ocrd-website#172 (comment)

@kba kba requested review from bertsky and stweil October 22, 2020 12:32
.gitmodules Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
Co-authored-by: Stefan Weil <sw@weilnetz.de>
Copy link
Collaborator

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Are there already models available for production? Then a rule install-models-sbb-binarize to download one or several models would be nice.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @stweil's comment about packaging models. Or can we do this via package_data and the upcoming resource discovery feat?

@kba
Copy link
Member Author

kba commented Oct 22, 2020

Pre-trained models are available from https://qurator-data.de/sbb_binarization/

I'll add an install-models rule for this as well.

As for bundling with the tool, I'll have to finish that in core first but then yes, certainly, they could be placed in one of the defined locations and model parameter could be a relative path. For now it must be absolute.

@bertsky
Copy link
Collaborator

bertsky commented Oct 22, 2020

As for bundling with the tool, I'll have to finish that in core first but then yes, certainly, they could be placed in one of the defined locations

But the Python package on PyPI should not be polluted with such large models. We could use setuptools' extra_requires feature to build packages including them from our makefile.

and model parameter could be a relative path. For now it must be absolute.

Couldn't we add an environment variable (say SBB_BINARIZATION_MODELS) in the meantime? Absolute paths make sharing makefile configurations across hosts impossible.

@kba
Copy link
Member Author

kba commented Oct 22, 2020

Then a rule install-models-sbb-binarize to download one or several models would be nice.

32285fa

Couldn't we add an environment variable (say SBB_BINARIZATION_MODELS) in the meantime? Absolute paths make sharing makefile configurations across hosts impossible.

Yes but then I'd suggest to implement it in a processor-independent way, see OCR-D/spec#176

@kba kba merged commit 846a624 into master Oct 22, 2020
@kba kba deleted the add-sbb-binarization branch October 22, 2020 15:40
@bertsky
Copy link
Collaborator

bertsky commented Oct 22, 2020

Yes but then I'd suggest to implement it in a processor-independent way, see OCR-D/spec#176

That's not the same thing, though. Yours is an environment variable for a specific parameter (which is entirely new AFAIK). My suggestion was to follow the example of TESSDATA_PREFIX, OCROPUS_DATA, CORASVANN_DATA to allow passing a reference point in the search paths for relative filenames. (Not standardized either, of course – but usable already.)

@kba
Copy link
Member Author

kba commented Oct 22, 2020

(Not standardized either, of course – but usable already.)

I realize this is more specific but it would not be much harder to implement in core for all the processors to make use of.

I can implement (=steal from cor-asv-ann ;-)) SBB_BINARIZATION_DATA in the short term, I'll send a PR.

@bertsky
Copy link
Collaborator

bertsky commented Oct 22, 2020

I can implement (=steal from cor-asv-ann ;-)) SBB_BINARIZATION_DATA in the short term, I'll send a PR.

That would be great. In particular for derived Docker images :-)

@kba
Copy link
Member Author

kba commented Oct 22, 2020

I can implement (=steal from cor-asv-ann ;-)) SBB_BINARIZATION_DATA in the short term, I'll send a PR.

qurator-spk/sbb_binarization#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants