Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression artifacts #964

Closed
cloakedch opened this issue May 16, 2022 · 3 comments
Closed

Compression artifacts #964

cloakedch opened this issue May 16, 2022 · 3 comments

Comments

@cloakedch
Copy link

cloakedch commented May 16, 2022

I am using ocrmypdf 13.4.4.post1+g603da520 like so:
ocrmypdf --optimize 0 input.pdf output.pdf

The OCR works fine, but the image quality of the output PDF is a lot worse than the quality of the input PDF.
I tried:

An example PDF i use is 1.7MB in size. When running ocrmypdf, it shrinks to 0.6MB.
Below is an illustration of the problem. The upper text line is from the input PDF i use. The lower text line is from the output PDF. There are some heavy (compression?) artifacts around the text.

image

What am I missing? Ideally I would like the images to remain untouched in the output PDF.

@jbarlow83
Copy link
Collaborator

You can try --output-type pdf to turn off PDF/A conversion in case Ghostscript is responsible.

@cloakedch
Copy link
Author

That did the trick, thank you!
The following two options worked:

  • --output-type pdf
  • --pdfa-image-compression lossless

@ByteSizedMarius
Copy link

I had the same issue. Setting the output-type to pdf fixed it. Maybe a note about this behavior could be added here? https://ocrmypdf.readthedocs.io/en/latest/optimizer.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants