Failed loading language 'eng'

79 views
Skip to first unread message

Jeremiah

unread,
Mar 10, 2020, 3:40:13 PM3/10/20
to tesseract-ocr
I am getting this error when running some userbot java code on my Win10 machine which utilizes tesseract for extracting words from the screen. I have set the environmental variable TESSDATA_PREFIX to C:\Program Files\Tesseract-OCR

Also when I run the, tesseract --version command this is what returns:
tesseract v5.0.0-alpha.20191030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

When I run tesseract --list-langs this is what returns:
List of available languages (2):
eng
osd

This is the full error:
Failed loading language 'eng'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
at net.sourceforge.tess4j.TessAPI1.TessBaseAPIRecognize(Native Method)
at net.sourceforge.tess4j.Tesseract1.getWords(Tesseract1.java:672)
at org.sikuli.script.Finder$Finder2.doFind(Finder.java:688)
at org.sikuli.script.Finder$Finder2.find(Finder.java:617)
at org.sikuli.script.Finder.findText(Finder.java:334)
at org.sikuli.script.Finder.findWords(Finder.java:351)
at org.sikuli.script.Region.doFindText(Region.java:3083)
at org.sikuli.script.Region.findWords(Region.java:2829)
at org.sikuli.script.Region.collectWords(Region.java:4980)
at org.sikuli.script.Region.collectWordsText(Region.java:4985)

Any idea on how to fix this?

PD

unread,
Mar 11, 2020, 7:21:57 AM3/11/20
to tesseract-ocr
Is TESSDATA_PREFIX pointing to tessdata directory ? It should point to tessdata directory where it can find traineddata file.

Regards
PD
Message has been deleted

Jeremiah

unread,
Mar 11, 2020, 8:21:55 AM3/11/20
to tesseract-ocr
Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me.

Shree Devi Kumar

unread,
Mar 11, 2020, 11:38:44 AM3/11/20
to tesseract-ocr, Quan Nguyen
One possibility is that the eng.traineddata file you have is not compatible with the latest tesseract version you are using.

The other possibility is that the Java userbot is calling tesseract with the wrong --oem. 

I have cc:ed Quan for advice regarding tess4j and Java. 

On Wed, Mar 11, 2020, 17:52 Jeremiah <mr.jayscar...@gmail.com> wrote:
Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9ad05491-2987-40f3-8212-792cb0d97a1a%40googlegroups.com.

Jeremiah

unread,
Mar 11, 2020, 12:34:40 PM3/11/20
to tesseract-ocr
So I did download the latest version of the trained data file and tried but it didn't work. In the actual Java code a Tesseract object isn't ever created from what I can find, what the bots do is create a Region in Sikulix which then calls collectWordsText(). 
This is the code for reference.

// Need to extend region to get text
Region extendedRegion = Helper.extendVertical("loginbtn-A", match);

Helper.log("Collecting words from login screen.");
List<String> intitailWords = new ArrayList<>();
try {
intitailWords = extendedRegion.collectWordsText();
} catch (UnsatisfiedLinkError ule) {
Helper.log("Possible OCR error. Verify that tesseract is installed and working.");
return;
}

shree

unread,
Mar 11, 2020, 1:04:51 PM3/11/20
to tesseract-ocr
I suggest you file an issue with Sikulix
Reply all
Reply to author
Forward
0 new messages