Tess4j failing near load of shared library tesseract-ocr-5.2 in Java 11 and 17, succeeds in Java 8

38 views
Skip to first unread message

Ralph Cook

unread,
Sep 25, 2022, 12:02:35 PM9/25/22
to tesseract-ocr
I'm using Tess4j in a Java program to access Tesseract and read  PDFs read with PDFBox. I've been using Java 8, and things are running. The program is not commercial; I provide it to non-profits doing pro bono legal work in my state. In java 8 using the command line and eclipse, the program runs fine; running from the command line in either Java 11 or Java 17 causes an error at the point where the program calls Tesseract.doOCR().

I've dumped class loading information and see that last class loaded before the fatal exception is com.sun.jna.Platform; it would be used, for instance, to determine the platform on which the program is running. I haven't been able to find the source for the 5.2 version I downloaded from UB Mannheim, that would be useful since the stack trace has line numbers.

The following is a snippet showing log messages, System.out.println messages, stacktraces, and class loading messages near the point of failure:

pdfRenderer created buffered Image
set a couple of tesseract vars
[14.960s][info][class,load] net.sourceforge.tess4j.util.ImageIOHelper source: rsrc:tess4j-5.4.0.jar
[14.961s][info][class,load] javax.imageio.IIOParam source: jrt:/java.desktop
[14.961s][info][class,load] javax.imageio.ImageWriteParam source: jrt:/java.desktop
[14.962s][info][class,load] com.github.jaiimageio.plugins.tiff.TIFFImageWriteParam source: rsrc:jai-imageio-core-1.4.0.jar
[14.963s][info][class,load] javax.imageio.IIOImage source: jrt:/java.desktop
[14.964s][info][class,load] com.sun.jna.Library source: rsrc:jna-5.12.1.jar
[14.965s][info][class,load] net.sourceforge.tess4j.ITessAPI source: rsrc:tess4j-5.4.0.jar
[14.965s][info][class,load] net.sourceforge.tess4j.TessAPI source: rsrc:tess4j-5.4.0.jar
[14.966s][info][class,load] net.sourceforge.tess4j.util.LoadLibs source: rsrc:tess4j-5.4.0.jar
[14.969s][info][class,load] com.sun.jna.Platform source: rsrc:jna-5.12.1.jar
[14.973s][info][class,load] java.lang.ExceptionInInitializerError source: jrt:/java.base
throwable while reading PDF
[14.973s][info][class,load] java.lang.Throwable$PrintStreamOrWriter source: jrt:/java.base
[14.974s][info][class,load] java.lang.Throwable$WrappedPrintStream source: jrt:/java.base
java.lang.ExceptionInInitializerError
        at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:442)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:326)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:309)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
        at drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
        at drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
        at drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
        at drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
        at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: zip file closed
        at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
        at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)

If I uninstall Java and install Java 8, the program works fine.

If I uninstall Java and install Java 11 or Java 17, it fails in this fashion.

Can anyone help me understand what the difference might be between the versions of Java so I can fix this?


Quan Nguyen

unread,
Sep 28, 2022, 11:52:15 PM9/28/22
to tesseract-ocr
The source of tess4j is available; you can trace through the code to see what threw the exception.

Nevertheless, "throwable while reading PDF" seems to point to the part of code that reads in PDF file. Was that something you wrote, or from tess4j itself?

Quan Nguyen

unread,
Sep 28, 2022, 11:54:57 PM9/28/22
to tesseract-ocr
PDF files are read by PDFBox library. You may want to look into that area as well.
Reply all
Reply to author
Forward
0 new messages