Pure Go Full Text Search of PDF Files

This library implements full text search for PDFs.

The public APIs are in index_search.go.

The are some command lines programs that demonstrate the library's functionality.

examples/pdf_search_demo.go demonstrates the main APIs.
examples/index.go builds an index over a set of PDFs.
examples/search.go searches the index build by examples/index.go.

Binary versions (executables) of these three programs are available in releases. There are 64-bit binaries for Windows, Mac and Linux. The binaries do not require a UniDoc license.

Installation

git clone https://github.com/PaperCutSoftware/pdfsearch

Replace uniDocLicenseKey and companyName in unidoc_glue.go with valid UniDoc license fields.

cd pdfsearch/examples
go build pdf_search_demo.go
go build index.go
go build search.go

examples/pdf_search_demo.go

Usage: ./pdf_search_demo -f <PDF path> <search term>

Example: ./pdf_search_demo -f PDF32000_2008.pdf cubic Bézier curve

The example will search PDF32000_2008.pdf for cubic Bézier curve.

pdf_search_demo.go shows how to use the APIs in index_search.go to

create indexes over PDFs,
search those indexes using full-text search, and
mark up PDFs with the locations of the search matches on pages.

examples/index.go

Usage: ./index <file pattern>

Example: ./index ~/climate/**/*.pdf

The example creates an on-disk index over the PDFs in ~/climate/ and its subdirectories.

examples/search.go

Usage: ./search <search term>

Example: ./search integrated assessment model

The example searches the on-disk index created by examples/index.go for integrated assessment model.

Libraries

index_search.go uses UniDoc for PDF parsing and bleve for search.

Talks about this library

GopherCon AU 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pure Go Full Text Search of PDF Files

Installation

examples/pdf_search_demo.go

examples/index.go

examples/search.go

Libraries

Talks about this library

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pure Go Full Text Search of PDF Files

Installation

examples/pdf_search_demo.go

examples/index.go

examples/search.go

Libraries

Talks about this library