Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support dewarping #180

Open
bertsky opened this issue Mar 2, 2021 · 2 comments
Open

support dewarping #180

bertsky opened this issue Mar 2, 2021 · 2 comments
Assignees

Comments

@bertsky
Copy link
Collaborator

bertsky commented Mar 2, 2021

This is somewhat already part of #116 but I would like to see a discussion for the specific problem that dewarping poses to the coordinate reproducibility principle.

Now that we have actual promising tools that we could wrap for page-level dewarping, like blitzDrt for perspective correction and Origami's dewarper for parametric grid morphing, we should provide a solution how to integrate this in OCR-D.

To represent the coordinate system after dewarping the page, we could rely on PAGE-XML's dewarping schema (DwGts for short). It references the original image under /DwGts/DocumentImage/@filename and describes the morphing grid under /DwGts/Grid (with Row[*]/@points against Row[*]/@index with Row[*]/@refLinePos and Column[*]/@index with Column[*]/@refLinePos). (Unfortunately, it comes with very little documentation and no examples.)

But this is a separate XML file not referenced by the PAGE-XML content schema (PcGts). So for dewarping steps, the output fileGrp would need to be comprised of 3 files per page:

  1. the output (dewarped) image
  2. the output PcGts annotation, referencing 1. under /PcGts/Page/@imageFilename instead of the original/input image, and transforming all existing coordinates of the input PcGts
  3. the output DwGts annotation, referencing the original/input image under /DwGts/DocumentImage/@filename

So any later processing step will only "see" the dewarped image and use its coordinate system. Whenever we want to transform back, we'll have to take the current PcGts, look up the earlier DwGts, and create a new PcGts by replacing the /PcGts/Page/@imageFilename with /DwGts/DocumentImage/@filename and inverse transforming all coordinates according to /DwGts/Grid. This could be at the final ingest, or some intermediate step.

Potential problems:

  • We would need a MIME type distinction between PcGts and DwGts. application/vnd.prima.page+xml does not look very discriminative. Or is there any other facility that could distinguish 2. and 3. in the dewarping fileGrp?
  • How do you look up the dewarping step (dewarping fileGrp in the METS), if any? Via existence of a single DwGts for some page in some fileGrp, or via an obligatory dewarped in PcGts/Page/AlternativeImage[*]/@comments, or via some general mechanism in METS (like mets:file/mets:groupid or mets:file/mets:transformFile or generally representing all workflow dependencies via mets:digiprovMD)?
@kba
Copy link
Member

kba commented Mar 2, 2021

Thanks for summarizing the problem and opening this discussion.

I will have to think more about this and ideally, we should also discuss this with @chris1010010. But as to the potential problems you raise:

  • We should define another media type application/vnd.prima.dewarping+xml.
  • mets:transformFile is probably the most METS-compliant mechanism but since we rely on the pc:AlternativeImage/@comments mechanism extensively already, we should focus on that. We'd need a way to distinguish the reversible/coordinate-stable dewarping to be implemented from non-reversible legacy dewarping.

@bertsky
Copy link
Collaborator Author

bertsky commented Mar 2, 2021

  • mets:transformFile is probably the most METS-compliant mechanism

I'm not so sure about that. It comes with an obligatory @TRANSFORMTYPE restricted to either decompression or decryption. We could ignore the usual semantics of that, but it's probably not so great for compliance.

On the other hand, using mets:GROUPID for an arbitrary identifier shared by the original and derived page-level image would meet the intent of the METS spec and allow us to easily find any associated images. (We could even use that for AlternativeImage dependency tracking across fileGrps in general. But it has only set semantics, whereas map semantics would be better for our directed dependency graph.)

  • since we rely on the pc:AlternativeImage/@comments mechanism extensively already, we should focus on that. We'd need a way to distinguish the reversible/coordinate-stable dewarping to be implemented from non-reversible legacy dewarping.

So far we rely on that mechanism only to indicate which coordinate transforms described in PcGts actually apply to an AlternativeImage, so we can track its coordinate system w.r.t. /Page/@imageFilename. But we don't need to do that (procedurally) when we allow replacing the latter, because the coordinate system will already be the same (the dewarping will already be "pre-applied").

That point was more about the workspace/METS than the processor/PAGE side: There should be a fast and reliable way of identifying any changes of the original image across the workflow chain, without the need to search through all pages and PAGEs. I'm not a METS expert, there are so many ways to represent that. We just need something that does not break any existing use-cases, is not too contrived and efficiently implementable. (And we should still allow for the possibility of not being able to track the coordinate system but nevertheless mark the change as such, so implementations like anybaseocr-dewarp can at least fit in.)

There's of course an alternative to replacing the original image and using DwGts: We could also facilitate PcGts-only dewarping with some representation in @custom as descriptive means for the coordinate transform. Here the need for a strict usage of the @comments mechanism and the issue of "stable" (i.e. with @custom) vs "legacy" (without @custom) does arise. (This would also help with line-level dewarping, which we cannot represent with DwGts at all). But then I am rather in favour of extending PcGts with some /PcGts/Page/Grid upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants