Failed loading language 'eng'

Jeremiah

unread,

Mar 10, 2020, 3:40:13 PM3/10/20

to tesseract-ocr

I am getting this error when running some userbot java code on my Win10 machine which utilizes tesseract for extracting words from the screen. I have set the environmental variable TESSDATA_PREFIX to C:\Program Files\Tesseract-OCR

Also when I run the, tesseract --version command this is what returns:

tesseract v5.0.0-alpha.20191030

leptonica-1.78.0

libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

Found AVX2

Found AVX

Found FMA

Found SSE

Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

When I run tesseract --list-langs this is what returns:

List of available languages (2):

eng

osd

This is the full error:

Failed loading language 'eng'

Tesseract couldn't load any languages!

Exception in thread "main" java.lang.Error: Invalid memory access

at net.sourceforge.tess4j.TessAPI1.TessBaseAPIRecognize(Native Method)

at net.sourceforge.tess4j.Tesseract1.getWords(Tesseract1.java:672)

at org.sikuli.script.Finder$Finder2.doFind(Finder.java:688)

at org.sikuli.script.Finder$Finder2.find(Finder.java:617)

at org.sikuli.script.Finder.findText(Finder.java:334)

at org.sikuli.script.Finder.findWords(Finder.java:351)

at org.sikuli.script.Region.doFindText(Region.java:3083)

at org.sikuli.script.Region.findWords(Region.java:2829)

at org.sikuli.script.Region.collectWords(Region.java:4980)

at org.sikuli.script.Region.collectWordsText(Region.java:4985)

Any idea on how to fix this?

PD

unread,

Mar 11, 2020, 7:21:57 AM3/11/20

to tesseract-ocr

Is TESSDATA_PREFIX pointing to tessdata directory ? It should point to tessdata directory where it can find traineddata file.

Regards

PD

Message has been deleted

Jeremiah

unread,

Mar 11, 2020, 8:21:55 AM3/11/20

to tesseract-ocr

Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me.

Shree Devi Kumar

unread,

Mar 11, 2020, 11:38:44 AM3/11/20

to tesseract-ocr, Quan Nguyen

One possibility is that the eng.traineddata file you have is not compatible with the latest tesseract version you are using.

The other possibility is that the Java userbot is calling tesseract with the wrong --oem.

I have cc:ed Quan for advice regarding tess4j and Java.

On Wed, Mar 11, 2020, 17:52 Jeremiah <mr.jayscar...@gmail.com> wrote:

Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9ad05491-2987-40f3-8212-792cb0d97a1a%40googlegroups.com.

Jeremiah

unread,

Mar 11, 2020, 12:34:40 PM3/11/20

to tesseract-ocr

So I did download the latest version of the trained data file and tried but it didn't work. In the actual Java code a Tesseract object isn't ever created from what I can find, what the bots do is create a Region in Sikulix which then calls collectWordsText().

This is the code for reference.

// Need to extend region to get text

Region extendedRegion = Helper.extendVertical("loginbtn-A", match);

Helper.log("Collecting words from login screen.");

List<String> intitailWords = new ArrayList<>();

try {

intitailWords = extendedRegion.collectWordsText();

} catch (UnsatisfiedLinkError ule) {

Helper.log("Possible OCR error. Verify that tesseract is installed and working.");

return;

}

shree

unread,

Mar 11, 2020, 1:04:51 PM3/11/20

to tesseract-ocr

I suggest you file an issue with Sikulix

Also see https://github.com/RaiMan/SikuliX1/issues/246

Reply all

Reply to author

Forward