This library implements full text search for PDFs.
- The public APIs are in index_search.go.
The are some command lines programs that demonstrate the library's functionality.
- examples/pdf_search_demo.go demonstrates the main APIs.
- examples/index.go builds an index over a set of PDFs.
- examples/search.go searches the index build by examples/index.go.
Binary versions (executables) of these three programs are available in releases. There are 64-bit binaries for Windows, Mac and Linux. The binaries do not require a UniDoc license.
git clone https://github.com/PaperCutSoftware/pdfsearch
Replace uniDocLicenseKey
and companyName
in unidoc_glue.go
with valid UniDoc license fields.
cd pdfsearch/examples
go build pdf_search_demo.go
go build index.go
go build search.go
Usage: ./pdf_search_demo -f <PDF path> <search term>
Example: ./pdf_search_demo -f PDF32000_2008.pdf cubic Bézier curve
The example will search PDF32000_2008.pdf
for cubic Bézier curve.
pdf_search_demo.go
shows how to use the APIs in index_search.go to
- create indexes over PDFs,
- search those indexes using full-text search, and
- mark up PDFs with the locations of the search matches on pages.
Usage: ./index <file pattern>
Example: ./index ~/climate/**/*.pdf
The example creates an on-disk index over the PDFs in ~/climate/
and its subdirectories.
Usage: ./search <search term>
Example: ./search integrated assessment model
The example searches the on-disk index created by examples/index.go for integrated assessment model.
index_search.go uses UniDoc for PDF parsing and bleve for search.