Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

scribeocr / scribe.js Public

Notifications You must be signed in to change notification settings
Fork 2
Star 17

Code
Issues 12
Pull requests
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: scribeocr/scribe.js

Releases · scribeocr/scribe.js

v0.4.1

10 Nov 19:24

Balearica

Compare

Choose a tag to compare

Loading

v0.4.1 Latest

Latest

What's Changed

Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
Added Nimbus Mono font (similar to Courier)
Improvements to text extraction from PDF files.
Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.

Assets 2

Loading

All reactions

v0.3.1

31 Oct 08:38

Balearica

Compare

Choose a tag to compare

Loading

v0.3.1

What's Changed

Fixed memory leaks

Full Changelog: v0.3.0...v0.3.1

Assets 2

Loading

All reactions

v0.3.0

31 Oct 03:59

Balearica

Compare

Choose a tag to compare

Loading

v0.3.0

What's Changed

Improvements to parsing existing text from PDF files
Various improvements to OCR text and bounding box quality
Fixed memory leak
Various minor changes

Full Changelog: v0.2.8...v0.3.0

Assets 2

Loading

All reactions

v0.2.8

30 Sep 07:30

Balearica

Compare

Choose a tag to compare

Loading

v0.2.8

Improved performance of "Quality" recognition mode.
- Many documents should run up to 10-15% faster in quality mode.
Updated Scribe Tesseract build to improve recognition accuracy.
- Accuracy for data tables and other complex layouts has been noticeably improved.
  - See benchmark repo for examples and accuracy metrics.
Improved image pre-processing.
Updated Vanilla Tesseract build to support debugging features and image upscaling.
Other minor changes

Full Changelog: v0.2.7...v0.2.8

Assets 2

Loading

All reactions

v0.2.7

25 Sep 05:21

Balearica

Compare

Choose a tag to compare

Loading

v0.2.7

Fixed bug preventing existing text in some PDFs from being detected (025456a)
Increased resolution at which PDFs are rendered (0dd8801)
Added calcSuppFontInfo option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)
- This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
Various other minor updates

Full Changelog: v0.2.6...v0.2.7

Assets 2

Loading

All reactions

v0.2.6

06 Sep 08:00

Balearica

Compare

Choose a tag to compare

Loading

v0.2.6

Restored compatibility with Webpack
Full Changelog: v0.2.5...v0.2.6

Assets 2

Loading

All reactions

v0.2.5

06 Sep 07:40

Balearica

Compare

Choose a tag to compare

Loading

v0.2.5

Improved performance, especially for single-page documents.
Improved accuracy for "Quality" recognition mode (the default).
Fixed various minor bugs

Full Changelog: v0.2.4...v0.2.5

Assets 2

Loading

All reactions

v0.2.4

29 Aug 07:56

Balearica

Compare

Choose a tag to compare

Loading

v0.2.4

Improved support with build tools such as Webpack
Fixed bug where PDF resources were being loaded when not necessary (dd99124)
Fixed Tesseract bug causing incorrect metrics for single-word recognition (Recognize Word) in Scribe OCR UI (f6be561)

Full Changelog: v0.2.3...v0.2.4

Assets 2

Loading

All reactions

v0.2.3

22 Aug 00:54

Balearica

Compare

Choose a tag to compare

Loading

v0.2.3

Added extractPDFTextImage option to importFiles
- When extractPDFTextNative, extractPDFTextOCR, and extractPDFTextImage are all set to true, text will always be extracted from the input PDF and set as the "active" version, even if there is no text.

Full Changelog: v0.2.2...v0.2.3

Assets 2

Loading

All reactions

v0.2.2

21 Aug 05:36

Balearica

Compare

Choose a tag to compare

Loading

v0.2.2

Added support for importing HOCR generated by Tesseract.js

Full Changelog: v0.2.1...v0.2.2

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.