memory access out of bounds & exception stack overflow #12

zakariamehbi · 2024-10-30T11:30:48Z

We’re running Scribe.js at scale and frequently receiving these messages, eventually making it unusable and forcing us to restart the app. Is there any way to restart the Scribe.js engine alone without having to restart the entire Node.js application?

{ name: 'RuntimeError', message: 'memory access out of bounds' }
{ name: 'RuntimeError', message: 'memory access out of bounds' }

{ name: 'Error', message: 'exception stack overflow!' }

The text was updated successfully, but these errors were encountered:

zakariamehbi · 2024-10-30T11:41:26Z

exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 1/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 2/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 3/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 4/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 5/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 6/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 7/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 8/8
exception stack overflow!
limit error: exception stack overflow!
warning: read error; treating as end of file

zakariamehbi · 2024-10-30T14:42:09Z

I tried running scribe-js in a worker that we can restart, but I’m still encountering the same error. The only solution seems to be restarting the entire Node.js application, any suggestions?

Balearica · 2024-10-31T08:39:43Z

Please try updating to the latest version (0.3.1 as of this writing) and let me know if the issue persists. I recently fixed several issues that were causing memory usage to grow excessively.

zakariamehbi · 2024-11-01T06:42:18Z

Hey @Balearica, thanks for the advice, it seems to be resolved! I just have a couple of follow-up questions:

Can we tweak any config settings to make Scribe.js run faster? We’re running it on a large server, and it’s currently underutilized.
Can images be extracted from a PDF or encoded?

Thanks again for your help!

Balearica · 2024-11-10T23:40:58Z

Can we tweak any config settings to make Scribe.js run faster? We’re running it on a large server, and it’s currently underutilized.

I just released a new version that uses workers by default in Node.js, so if you update to v0.4.1 multi-page documents will have different pages run on different worker threads, which should speed things up significantly. The number of workers can be controlled manually by setting the scribe.opt.workerN to the desired number of workers prior to calling any functions.

Although both the browser and Node.js version now run most computationally expensive steps in workers, which can result in 6x faster speeds, the built-in parallel processing capabilities are likely never going to saturate a 48 core server (for example). If you need something at that level, that would require running Scribe.js in multiple Node.js processes (whether using child_process or some other mechanism) and then implementing a scheduler on top of that.

Can images be extracted from a PDF or encoded?

I'm not sure I understand this question. If you're asking about rendering a PDF page to an image, you can render page n of an input PDF to a .png image using scribe.data.image.getNative(n).

zakariamehbi · 2024-11-18T22:01:31Z

Hello @Balearica,

That’s fantastic news! We had been trying to implement this internally, so it’s great to see it now natively supported. Thanks for all the hard work! I’ll test it out and provide feedback soon.

Our users have PDFs that contain embedded images, such as logos, charts, etc. I’m not referring to rendering the entire PDF as an image but rather extracting specific images directly from the PDF pages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory access out of bounds & exception stack overflow #12

memory access out of bounds & exception stack overflow #12

zakariamehbi commented Oct 30, 2024

zakariamehbi commented Oct 30, 2024

zakariamehbi commented Oct 30, 2024

Balearica commented Oct 31, 2024

zakariamehbi commented Nov 1, 2024

Balearica commented Nov 10, 2024

zakariamehbi commented Nov 18, 2024

memory access out of bounds & exception stack overflow #12

memory access out of bounds & exception stack overflow #12

Comments

zakariamehbi commented Oct 30, 2024

zakariamehbi commented Oct 30, 2024

zakariamehbi commented Oct 30, 2024

Balearica commented Oct 31, 2024

zakariamehbi commented Nov 1, 2024

Balearica commented Nov 10, 2024

zakariamehbi commented Nov 18, 2024