Skip to content

Latest commit

 

History

History
66 lines (43 loc) · 2.09 KB

README.md

File metadata and controls

66 lines (43 loc) · 2.09 KB

pdfimages_combine

Creates a script to combine images and masks extracted from pdf by pdfimages. Images are copied with stripping to remove metadata to make them deterministic, otherwise there is a timestamp in them when they are extracted from the PDF which is annoying.

Needed packages

dnf install poppler-utils cmake gcc-c++ ImageMagick fdupes

Build Instructions

# If you are using the tar file you can skip the git submodule as it will fail because its not a git repo
git submodule update --init
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Running it

An example of running it would be

mkdir images
cd images
pdfimages -list ../MyPDF.pdf > list.txt
pdfimages -p -all ../MyPDF.pdf mypdf
pdfimages_combine mypdf > script.sh
bash script.sh

Optionally get rid of duplicates

fdupes -N --delete output

Check the resulting script and run it if you want. NB pdfimages_combine creates a subdirectory called output to put the results in.

Other useful tools/commands

MuPDF is a very similar tools to pdfimages - sometimes it does better at image extract.

dnf install mupdf

Example command is

mutool extract MyPDF.pdf

Ghostscript is really usefull, especially for those map pages when you want all the map labels because you can "print" a PDF to PNG's or JPEG's. You can override the papersize so your images are not letter/A4 sized.

dnf install ghostscript

Some example commands

Convert multi page PDF to PNG files

gs -dNOPAUSE -dBATCH -sDEVICE=png16m -r600 -dGraphicsAlphaBits=4 -sOutputFile="image-%d.png" MyPDF.pdf
gs -dNOPAUSE -dBATCH -sDEVICE=png16m –dFirstPage=3 –dLastPage=4 -sOutputFile="image-%d.png" MyPDF.pdf

Useful other switchs are:

  • -r300 - Save image at 300 DPI
  • -dGraphicsAlphaBits=4 - Highest quality output
  • -sPAPERSIZE=a4 - Change the paper size
  • -sPageList=1,3,5 - List of pages rather than range

More info at: https://www.ghostscript.com/doc/current/Use.htm#PDF_switches

Links