This example shows how scribe.js can be bundled with Webpack 5 into a web application.
To run this example locally, perform the following steps.
## Clone the repo
git clone https://github.com/scribeocr/scribe.js-example-webpack5.git
cd scribe.js-example-webpack5
## Install the dependencies
npm install
## Run Webpack
npm run build
## Start a server in the root directory
npm start
After the server is running, visit the web page indicated, and you will be able to extract text from an image or PDF.
The only non-default setting in the webpack.config.js file is using DefinePlugin
to set the 2 values. First, process
is set to undefined
to force only the browser implementations of certain functions (and not Node.js implementations) to be included. Scribe.js uses the condition typeof process === 'undefined'
to check the current environment, and Webpack will not evaluate this condition as true
during the build step unless the value of process
is set explicitly.
Second, DISABLE_DOCX_XLSX
is defined, which disables the .docx
and .xlsx
output formats. These formats are rarely used and still officially experimental, and due to their use of the zip.js package (which does not play nicely with Webpack) including them currently breaks the build.
No. Scribe.js uses web workers for nearly everything, and support for Web Workers was added in Webpack 5. While Webpack 4 does have a worker-loader plugin, it appears to require using a special syntax for creating workers that only runs with Webpack (and not natively in browsers), which is out of the question. If you are a Webpack 4 user who gets a build working with Scribe.js, please submit a PR and we will add it to the list.
Running this build produces many files, which may be unexpected since some basic Webpack builds only produce a single bundle.min.js
files. There are several reasons for this, however rest assured, all files are never loaded by a single user.
Scribe.js only loads large resources when (1) explicitly requested using the scribe.init
function or (2) required to perform a requested operation. For example, the resources required to read PDFs are only loaded when setting pdf: true
in the scribe.init
options or when encountering a .pdf
file. If an application is only used to recognize .png
or .jpeg
files, that resources is never loaded. Every lazily-loaded resource ends up in its own file.
Similarly, Scribe.js loads different versions of the same file to save space. For example, when recognizing documents in English, Scribe.js loads fonts that only contain Latin script to save space.
Scribe.js performs virtually all computationally-expensive operations in web workers. In addition to improving performance on multi-core systems, this frees up the main process, which results in a more responsive UI. Webpack puts the code for each worker in one or more separate files.
All files used by Scribe.js are included in the build. This results in many font files (.woff
or .tff
) in the dist
directory. It would be possible to modify Scribe.js so these are hosted on a CDN by default; feel free to open a Git Issue if this is something that matters to you.