Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "subprocess" error for the same PDF files which are working fine with Tabula in local machine. #540

Open
deepakdhiman7 opened this issue Feb 23, 2024 · 0 comments

Comments

@deepakdhiman7
Copy link

deepakdhiman7 commented Feb 23, 2024

We are getting below "subprocess" error, when we are running code in container. In local machine, however it is working fine. We had installed Tabula on local machine an year back. Even in container, it was working fine until this week. Attaching PDFs as well for which it is failing. Versions of packages mentioned below. Can it be PDF files although for same version they are running in local machine? or Environments? Although we checked, there has been no update in environments permissions etc.

PDFs:
IONIS Registartion document (002).pdf
test_Vinayak.pdf
Uploading Annual_Report.pdf…

Package Versions:
(llms) dd00740409@ns3067540:~$ java -version openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

(llms) dd00740409@ns3067540:~$ python Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] on linux

Error:
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', '/usr/local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', '9', '--stream', '--guess', '--format', 'JSON', 'Roa8dvYUVmHQLKhhvTiPL.pdf']' returned non-zero exit status 1.

Logs:
Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjavajpeg.so: libjpeg.so.8: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1838) at java.lang.Runtime.loadLibrary0(Runtime.java:843) at java.lang.System.loadLibrary(System.java:1136) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:92) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:90) at java.security.AccessController.doPrivileged(Native Method) at com.sun.imageio.plugins.jpeg.JPEGImageReader.<clinit>(JPEGImageReader.java:89) at com.sun.imageio.plugins.jpeg.JPEGImageReaderSpi.createReaderInstance(JPEGImageReaderSpi.java:85) at javax.imageio.spi.ImageReaderSpi.createReaderInstance(ImageReaderSpi.java:320) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:529) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:513) at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:155) at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:58) at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:80) at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175) at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:243) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.createInputStream(PDImageXObject.java:791) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(SampledImageReader.java:517) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:226) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:481) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:462) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1110) at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:347) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:268) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:254) at technology.tabula.Utils.pageConvertToImage(Utils.java:285) at technology.tabula.detectors.NurminenDetectionAlgorithm.detect(NurminenDetectionAlgorithm.java:101) at technology.tabula.CommandLineApp$TableExtractor.extractTablesBasic(CommandLineApp.java:421) at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:408) at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180) at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124) at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106) at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)

@deepakdhiman7 deepakdhiman7 changed the title Getting "subprocess" error for files while the same files working fine with Tabula in other environment. Getting "subprocess" error for the same PDF files which are working fine with Tabula in local machine. Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant