pdf2htmlEX is a precise PDF to HTML converter. Since the original repository is not maintained anymore, many forks have appeared. This container is based on this up-to-date fork. This author has also made a Dockerfile.
This container provide a small webservice (~54Mo).
docker build -t jojolebarjos/pdf2htmlex .
docker run -d -p 8080:8080 jojolebarjos/pdf2htmlex
HTTP POST /convert
file: <my.pdf>
<!DOCTYPE html>
<!-- Created by pdf2htmlEX (https://github.com/coolwanglu/pdf2htmlex) -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8"/>
...
- Probably need poppler data (--poppler-data-dir="")
- Properly handle errors in server
- Add version endpoint