Tesseract couldn't load any languages!

3,945 views
Skip to first unread message

Dattatraya Tembare

unread,
May 4, 2018, 2:38:16 PM5/4/18
to tesseract-ocr
Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:490)
at com.sun.jna.Function.invoke(Function.java:434)
at com.sun.jna.Function.invoke(Function.java:354)
at com.sun.jna.Library$Handler.invoke(Library.java:244)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:433)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:288)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193)
at com.ea.ocr.tesseract.ReadImageText.readText(ReadImageText.java:59)
at com.ea.ocr.tesseract.ReadImageText.main(ReadImageText.java:32)
Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

Zdenko Podobny

unread,
May 4, 2018, 2:46:35 PM5/4/18
to tesser...@googlegroups.com
The error message is clear. Or?

Zdenko


pi 4. 5. 2018 o 20:38 Dattatraya Tembare <datta....@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/85a15318-3f8b-43cb-9e9e-74be69a29825%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Quan Nguyen

unread,
May 4, 2018, 6:04:41 PM5/4/18
to tesseract-ocr
You'll need to setDatapath to your tessdata directory so Tesseract can find the *.traineddata files

rolandko...@gmail.com

unread,
May 7, 2018, 3:31:07 PM5/7/18
to tesseract-ocr
I downloaded a new language from https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital (7-segment numbers)
i put the file in my tessdata directory: C:\Program Files (x86)\Tesseract-OCR\tessdata

when I run tesseract I get the error:

An error occured:  { [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "tesseract test2.png C:\Users\User\AppData\Local\Temp\node-tesseract-3841873a-b882-403e-879f-270fc39bd90d
l -psm 7"
Failed loading language 'letsgodigital'

Tesseract couldn't load any languages!
Could not initialize tesseract.
]
  killed: false,
  code: 1,
  signal: null,
  cmd: 'C:\\Windows\\system32\\cmd.exe /s /c "tesseract test2.png C:\\Users\\User\\AppData\\Local\\Temp\\node-tesseract-3841873a-b882-403e-879f-270fc39bd90d -l letsgodigital -psm 7"' }


What did I do wrong?

Dattatraya Tembare

unread,
May 17, 2018, 11:28:35 PM5/17/18
to tesseract-ocr
Thanks!
Your solution worked.
Now facing something different -- Same pattern 33 files executed successfully, failed for 34th file. 

java.lang.Error: Invalid memory access
 at com
.sun.jna.Native.invokePointer(Native Method) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com
.sun.jna.Function.invokePointer(Function.java:490) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com
.sun.jna.Function.invoke(Function.java:434) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com
.sun.jna.Function.invoke(Function.java:354) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com
.sun.jna.Library$Handler.invoke(Library.java:244) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com
.sun.proxy.$Proxy77.TessBaseAPIGetUTF8Text(Unknown Source) ~[na:na]
 at net
.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:433) ~[tess4j-4.0.1.jar:4.0.1]
 at net
.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:288) ~[tess4j-4.0.1.jar:4.0.1]
 at net
.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209) ~[tess4j-4.0.1.jar:4.0.1]
 at net
.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193) ~[tess4j-4.0.1.jar:4.0.1]

When checked into Tesseract code, found below line 

Pointer utf8Text = renderedFormat == RenderedFormat.HOCR ? api.TessBaseAPIGetHOCRText(handle, pageNum - 1) : api.TessBaseAPIGetUTF8Text(handle);

Please guide.

Regards,
Datta

Dattatraya Tembare

unread,
May 17, 2018, 11:57:59 PM5/17/18
to tesseract-ocr

[SOLVED] changed the language from 'hin+eng' to 'hin'
In this case selection of language also matters -
 I was processing image with lang=hin+eng, but it was giving the same error (mentioned in this post)

As English text was less in the image so I changed lang=hin and I got the expected result.

public static void main(String[] args) {
        Tesseract in = new ReadImageText().getTesseractInstance("C:/Program Files (x86)/Tesseract-OCR/tessdata/", "hin");
        try {
            String resultText = in.doOCR(new File("C:/EA/app-result/im/01-001/34/0.png"));
            log.info("resultText {}", resultText);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

shree

unread,
May 18, 2018, 1:32:59 AM5/18/18
to tesseract-ocr
It is possible that you have not downloaded eng.traineddata or it is in a different location.

Try running tesseract on command line, check --list-langs.

Dattatraya Tembare

unread,
May 19, 2018, 7:48:33 PM5/19/18
to tesseract-ocr
I have eng tessdata, it worked for previous 33 files and failed for 34th.
When I tried command like it worked. But in command line I have not specified tessdata path.my assumption is it is picking default path.
Reply all
Reply to author
Forward
0 new messages