Skip to content

Latest commit

 

History

History
34 lines (18 loc) · 1.49 KB

README.md

File metadata and controls

34 lines (18 loc) · 1.49 KB

pdf to csv

Lots of data is held in graphs within pdf files.

Some of these graphs are represented using an image format, e.g., jpeg, while others are created using pdf operations (e.g., draw a cross at 10, 20).

If the pdf operations that create a graph are known, it is possible to extract the coordinates of the points in a graph; proof of concept

This project aims to add an option to Mozilla's pdf renderer to extracts the x/y coordinates of all the points appearing in a graph highlighted by the user.

pdf disassemblers

qpdf does an excellent job of mapping the contents of a pdf to text.

pdffigures extracts figures from pdfs.

Related tools

Manual conversion to svg and then automatic conversion from svg.

chemdataextractor, as the name suggests, is oriented towards extracting chemical information from pdfs, e.g., chemical names and formula.

utopia attempts to extract structural features of an article, including citations.

pdfgrep

pdftabextract

xpdf is used as a library by many tools.

poppler is a popular pdf rendering library.