Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Tess4J - a Java wrapper for Tesseract OCR DLL

2,811 views
Skip to first unread message

Quan Nguyen

unread,
Aug 22, 2010, 10:35:26 PM8/22/10
to tesseract-ocr
A JNA-based wrapper for Tesseract OCR DLL, the library provides
optical character recognition (OCR) support for:

* TIFF, JPEG, GIF, PNG, and BMP image formats
* Multi-page TIFF images
* PDF document format

http://tess4j.sf.net

James Le Cuirot

unread,
Aug 23, 2010, 3:44:26 AM8/23/10
to tesser...@googlegroups.com

Nice work. Regarding this note on your site...

"Testing on Linux will be performed once the shared object library
equivalent to the DLL becomes available."

I don't know if you know but I've more or less taken over development
of Tesjeract, the JNI equivalent. I got it working on Linux by
basically mimicking the necessary parts of tessdll.dll. Feel free to
borrow that code. Ideally I'd like to ditch that ugly mess and use
Tesseract 3 but I've been tidying up Leptonica first.

http://code.google.com/p/tesjeract/

James

Quan Nguyen

unread,
Aug 23, 2010, 7:12:14 PM8/23/10
to tesseract-ocr
Oh, thanks, James. I'd be happy to use the code to build the .so and
avoid duplicate effort. I'll get in touch with you if more info is
needed. Thank you, again.

Quan

On Aug 23, 2:44 am, James Le Cuirot <ch...@aura-online.co.uk> wrote:
> On Sun, 22 Aug 2010 19:35:26 -0700 (PDT)
>

Kamalakara Ambati

unread,
Apr 28, 2012, 11:32:30 PM4/28/12
to tesser...@googlegroups.com
Hi Quan,

I am new bee to this tess4j stuff. I have downloaded the source from https://sourceforge.net/projects/tess4j/ testing it from eclipse.when i ran the sample app

    public static void main(String[] args) {
        File imageFile = new File("C:\\tesseract-ocr\\JNA\\Tess4J\\eurotext.png");
    	//File imageFile = new File("C:\\tesseract-ocr\\H1B.jpg");
        
        Tesseract instance = Tesseract.getInstance();  // JNA Interface Mapping
        // Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping

        try {
            String result = instance.doOCR(imageFile);
            System.out.println(result);
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }catch (Exception ex) {
            System.err.println(ex.getMessage());
        }
    }

When i  followed the instruction i got the following error. I am more into Java but not on to the c++ side to figure out the issue. Please let me know, how to resolve this.


#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x173c0fe9, pid=8136, tid=4576
#
# JRE version: 6.0_30-b12
# Java VM: Java HotSpot(TM) Client VM (20.5-b03 mixed mode windows-x86 )
# Problematic frame:
# C  [libtesseract302.dll+0x70fe9]
#
# An error report file with more information is saved as:
# C:\SAS2.0\Tess4Java\hs_err_pid8136.log
#
# If you would like to submit a bug report, please visit:
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

Could you please let me know, where i am going wrong. Thanks!

Regards,
Kamal.

Quan Nguyen

unread,
Apr 29, 2012, 12:02:59 PM4/29/12
to tesser...@googlegroups.com
Armed with Ant and JUnit, you can execute "ant test" command to validate the program.

BTW, more test cases have been added in latest update.

MadhuSudan Kaka

unread,
Mar 22, 2025, 6:03:42 PMMar 22
to tesseract-ocr
Hi Quan,

I am trying to use tess4j in my Pega Application which is running on Linux Ubuntu and getting a Fatal error that is restarting the server everytime I run the Java Program.

I imported tess4j 5.4.0.jar in my Pega application and its supporting jars (details are shared below). I copied the native libraries (.dll) to a folder on my server (azure mount) and tessdata folder with eng.traineddata. Also, defined these locations in Java path of yaml file in Pega.

Initially, I was getting this error (NoClassDefFoundError: Could not initialize class net.sourceforge.tess4j.TessAPI). So, I downloaded a .so file (libtessract4.0.0.so) from web and added it to the native library folder.

In my Java code, I initialized the tesseract object, set the JNA path, TessData Path invoked doOCR() on a Image. I am getting below error when I call tesseract.doOCR() in my code. 

Here are the server logs:

07-Mar-2025 20:58:12.967 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.Native.extractFromResourcePath Looking in classpath from com.pega.pegarules.bootstrap.loader.PRAppLoader@2b7e8044 for /com/sun/jna/linux-x86-64/libjnidispatch.so
07-Mar-2025 20:58:13.053 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.Native.extractFromResourcePath Found library resource at pegajdbc://408132785:0/jna-5.8.0.jar!/com/sun/jna/linux-x86-64/libjnidispatch.so
07-Mar-2025 20:58:13.149 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.Native.extractFromResourcePath Extracting library to /usr/local/tomcat/temp/jna14459296195516564982.tmp
07-Mar-2025 20:58:13.150 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath Trying /usr/local/tomcat/temp/jna14459296195516564982.tmp
07-Mar-2025 20:58:13.157 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath Found jnidispatch at /usr/local/tomcat/temp/jna14459296195516564982.tmp
07-Mar-2025 20:58:13.867 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.NativeLibrary.loadLibrary Looking for library 'tesseract'
07-Mar-2025 20:58:13.867 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.NativeLibrary.loadLibrary Adding paths from jna.library.path: /mnt/BCDS/outbound/tess4j/linux-x86-64/
07-Mar-2025 20:58:13.911 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.NativeLibrary.loadLibrary Trying /mnt/BCDS/outbound/tess4j/linux-x86-64/libtesseract.so
07-Mar-2025 20:58:14.255 INFO [https-jsse-nio-8443-exec-2] com.sun.jna.NativeLibrary.loadLibrary Found library 'tesseract' at /mnt/BCDS/outbound/tess4j/linux-x86-64/libtesseract.so
!strcmp(locale, "C"):Error:Assert failed:in file /mnt/c/nix/Dev/cpp/lib/tesseract/src/api/baseapi.cpp, line 209


#
# A fatal error has been detected by the Java Runtime Environment:
#

#  SIGSEGV (0xb) at pc=0x00007f63e0bff898, pid=1, tid=523
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.21+9 (11.0.21+9) (build 11.0.21+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.21+9 (11.0.21+9, mixed mode, tiered, compressed oops, serial gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x28898]  abort+0x178
#
# Core dump will be written. Default location: //core.1


#
# An error report file with more information is saved as:

# /tmp/hs_err_pid1.log


#
# If you would like to submit a bug report, please visit:


# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.


Jar files imported in Pega application: 

tess4j-5.4.0.jar

jna-5.8.0.jar

jna-platform-5.8.0.jar

slf4j-api-1.7.30.jar

slf4j-simple-1.7.30.jar

lept4j-1.16.2.jar

commons-io-2.6.jar

And copied the jar files to azure mount location (/mnt/BCDS/outbound/) on server


Java Paths:

-Djna.library.path=/mnt/BCDS/outbound/tess4j/linux-x86-64/

-Dtessdata.path=/mnt/BCDS/outbound/tess4j/tessdata/

Reply all
Reply to author
Forward
0 new messages