Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory access out of bounds & exception stack overflow #12

Open
zakariamehbi opened this issue Oct 30, 2024 · 6 comments
Open

memory access out of bounds & exception stack overflow #12

zakariamehbi opened this issue Oct 30, 2024 · 6 comments

Comments

@zakariamehbi
Copy link

We’re running Scribe.js at scale and frequently receiving these messages, eventually making it unusable and forcing us to restart the app. Is there any way to restart the Scribe.js engine alone without having to restart the entire Node.js application?

{ name: 'RuntimeError', message: 'memory access out of bounds' }
{ name: 'RuntimeError', message: 'memory access out of bounds' }
{ name: 'Error', message: 'exception stack overflow!' }
@zakariamehbi
Copy link
Author

exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 1/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 2/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 3/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 4/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 5/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 6/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 7/8
exception stack overflow!
limit error: exception stack overflow!
format error: object is not a stream
warning: cannot load content stream part 8/8
exception stack overflow!
limit error: exception stack overflow!
warning: read error; treating as end of file

@zakariamehbi
Copy link
Author

I tried running scribe-js in a worker that we can restart, but I’m still encountering the same error. The only solution seems to be restarting the entire Node.js application, any suggestions?

@Balearica
Copy link
Contributor

Please try updating to the latest version (0.3.1 as of this writing) and let me know if the issue persists. I recently fixed several issues that were causing memory usage to grow excessively.

@zakariamehbi
Copy link
Author

Hey @Balearica, thanks for the advice, it seems to be resolved! I just have a couple of follow-up questions:

  1. Can we tweak any config settings to make Scribe.js run faster? We’re running it on a large server, and it’s currently underutilized.
  2. Can images be extracted from a PDF or encoded?

Thanks again for your help!

@Balearica
Copy link
Contributor

Can we tweak any config settings to make Scribe.js run faster? We’re running it on a large server, and it’s currently underutilized.

I just released a new version that uses workers by default in Node.js, so if you update to v0.4.1 multi-page documents will have different pages run on different worker threads, which should speed things up significantly. The number of workers can be controlled manually by setting the scribe.opt.workerN to the desired number of workers prior to calling any functions.

Although both the browser and Node.js version now run most computationally expensive steps in workers, which can result in 6x faster speeds, the built-in parallel processing capabilities are likely never going to saturate a 48 core server (for example). If you need something at that level, that would require running Scribe.js in multiple Node.js processes (whether using child_process or some other mechanism) and then implementing a scheduler on top of that.

Can images be extracted from a PDF or encoded?

I'm not sure I understand this question. If you're asking about rendering a PDF page to an image, you can render page n of an input PDF to a .png image using scribe.data.image.getNative(n).

@zakariamehbi
Copy link
Author

Hello @Balearica,

That’s fantastic news! We had been trying to implement this internally, so it’s great to see it now natively supported. Thanks for all the hard work! I’ll test it out and provide feedback soon.

Our users have PDFs that contain embedded images, such as logos, charts, etc. I’m not referring to rendering the entire PDF as an image but rather extracting specific images directly from the PDF pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants