Compression artifacts #964

cloakedch · 2022-05-16T14:02:51Z

I am using ocrmypdf 13.4.4.post1+g603da520 like so:
ocrmypdf --optimize 0 input.pdf output.pdf

The OCR works fine, but the image quality of the output PDF is a lot worse than the quality of the input PDF.
I tried:

Installing the JBIG2 encoder as documented here: https://ocrmypdf.readthedocs.io/en/latest/jbig2.html
Using unpaper flags such as --clean and --clean-final
Specifying image DPI such as --image-dpi 300 (for testing purposes only)

An example PDF i use is 1.7MB in size. When running ocrmypdf, it shrinks to 0.6MB.
Below is an illustration of the problem. The upper text line is from the input PDF i use. The lower text line is from the output PDF. There are some heavy (compression?) artifacts around the text.

What am I missing? Ideally I would like the images to remain untouched in the output PDF.

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2022-05-16T19:40:59Z

You can try --output-type pdf to turn off PDF/A conversion in case Ghostscript is responsible.

cloakedch · 2022-05-17T05:23:26Z

That did the trick, thank you!
The following two options worked:

--output-type pdf
--pdfa-image-compression lossless

ByteSizedMarius · 2025-01-03T13:32:45Z

I had the same issue. Setting the output-type to pdf fixed it. Maybe a note about this behavior could be added here? https://ocrmypdf.readthedocs.io/en/latest/optimizer.html

cloakedch closed this as completed May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression artifacts #964

Compression artifacts #964

cloakedch commented May 16, 2022 •

edited

Loading

jbarlow83 commented May 16, 2022

cloakedch commented May 17, 2022

ByteSizedMarius commented Jan 3, 2025

Compression artifacts #964

Compression artifacts #964

Comments

cloakedch commented May 16, 2022 • edited Loading

jbarlow83 commented May 16, 2022

cloakedch commented May 17, 2022

ByteSizedMarius commented Jan 3, 2025

cloakedch commented May 16, 2022 •

edited

Loading