Skip to content

Helps generate a script to combine images from pdfimages back into transparent pngs

License

Notifications You must be signed in to change notification settings

bluedreamer/pdfimages_combine

Repository files navigation

pdfimages_combine

Creates a script to combine images and masks extracted from pdf by pdfimages. Images are copied with stripping to remove metadata to make them deterministic, otherwise there is a timestamp in them when they are extracted from the PDF which is annoying.

Needed packages

dnf install poppler-utils cmake gcc-c++ ImageMagick fdupes

Build Instructions

# If you are using the tar file you can skip the git submodule as it will fail because its not a git repo
git submodule update --init
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Running it

An example of running it would be

mkdir images
cd images
pdfimages -list ../MyPDF.pdf > list.txt
pdfimages -p -all ../MyPDF.pdf mypdf
pdfimages_combine mypdf > script.sh
bash script.sh

Optionally get rid of duplicates

fdupes -N --delete output

Check the resulting script and run it if you want. NB pdfimages_combine creates a subdirectory called output to put the results in.

Other useful tools/commands

MuPDF is a very similar tools to pdfimages - sometimes it does better at image extract.

dnf install mupdf

Example command is

mutool extract MyPDF.pdf

Ghostscript is really usefull, especially for those map pages when you want all the map labels because you can "print" a PDF to PNG's or JPEG's. You can override the papersize so your images are not letter/A4 sized.

dnf install ghostscript

Some example commands

Convert multi page PDF to PNG files

gs -dNOPAUSE -dBATCH -sDEVICE=png16m -r600 -dGraphicsAlphaBits=4 -sOutputFile="image-%d.png" MyPDF.pdf
gs -dNOPAUSE -dBATCH -sDEVICE=png16m –dFirstPage=3 –dLastPage=4 -sOutputFile="image-%d.png" MyPDF.pdf

Useful other switchs are:

  • -r300 - Save image at 300 DPI
  • -dGraphicsAlphaBits=4 - Highest quality output
  • -sPAPERSIZE=a4 - Change the paper size
  • -sPageList=1,3,5 - List of pages rather than range

More info at: https://www.ghostscript.com/doc/current/Use.htm#PDF_switches

Links

About

Helps generate a script to combine images from pdfimages back into transparent pngs

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published