Tesseract couldn't load any languages!

Dattatraya Tembare

unread,

May 4, 2018, 2:38:16 PM5/4/18

to tesseract-ocr

Exception in thread "main" java.lang.Error: Invalid memory access

at com.sun.jna.Native.invokePointer(Native Method)

at com.sun.jna.Function.invokePointer(Function.java:490)

at com.sun.jna.Function.invoke(Function.java:434)

at com.sun.jna.Function.invoke(Function.java:354)

at com.sun.jna.Library$Handler.invoke(Library.java:244)

at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)

at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:433)

at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:288)

at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209)

at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193)

at com.ea.ocr.tesseract.ReadImageText.readText(ReadImageText.java:59)

at com.ea.ocr.tesseract.ReadImageText.main(ReadImageText.java:32)

Error opening data file ./eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

Zdenko Podobny

unread,

May 4, 2018, 2:46:35 PM5/4/18

to tesser...@googlegroups.com

The error message is clear. Or?

Zdenko

pi 4. 5. 2018 o 20:38 Dattatraya Tembare <datta....@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/85a15318-3f8b-43cb-9e9e-74be69a29825%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Quan Nguyen

unread,

May 4, 2018, 6:04:41 PM5/4/18

to tesseract-ocr

You'll need to setDatapath to your tessdata directory so Tesseract can find the *.traineddata files

rolandko...@gmail.com

unread,

May 7, 2018, 3:31:07 PM5/7/18

to tesseract-ocr

I downloaded a new language from https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital (7-segment numbers)
i put the file in my tessdata directory: C:\Program Files (x86)\Tesseract-OCR\tessdata

when I run tesseract I get the error:

An error occured: { [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "tesseract test2.png C:\Users\User\AppData\Local\Temp\node-tesseract-3841873a-b882-403e-879f-270fc39bd90d
l -psm 7"
Failed loading language 'letsgodigital'

Tesseract couldn't load any languages!

Could not initialize tesseract.
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\\Windows\\system32\\cmd.exe /s /c "tesseract test2.png C:\\Users\\User\\AppData\\Local\\Temp\\node-tesseract-3841873a-b882-403e-879f-270fc39bd90d -l letsgodigital -psm 7"' }

What did I do wrong?

Dattatraya Tembare

unread,

May 17, 2018, 11:28:35 PM5/17/18

to tesseract-ocr

Thanks!
Your solution worked.

Now facing something different -- Same pattern 33 files executed successfully, failed for 34th file.

java.lang.Error: Invalid memory access
 at com.sun.jna.Native.invokePointer(Native Method) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com.sun.jna.Function.invokePointer(Function.java:490) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com.sun.jna.Function.invoke(Function.java:434) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com.sun.jna.Function.invoke(Function.java:354) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com.sun.jna.Library$Handler.invoke(Library.java:244) ~[jna-4.5.1.jar:4.5.1 (b0)]
 at com.sun.proxy.$Proxy77.TessBaseAPIGetUTF8Text(Unknown Source) ~[na:na]
 at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:433) ~[tess4j-4.0.1.jar:4.0.1]
 at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:288) ~[tess4j-4.0.1.jar:4.0.1]
 at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209) ~[tess4j-4.0.1.jar:4.0.1]
 at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193) ~[tess4j-4.0.1.jar:4.0.1]

When checked into Tesseract code, found below line

Pointer utf8Text = renderedFormat == RenderedFormat.HOCR ? api.TessBaseAPIGetHOCRText(handle, pageNum - 1) : api.TessBaseAPIGetUTF8Text(handle);

Please guide.

Regards,

Datta

Dattatraya Tembare

unread,

May 17, 2018, 11:57:59 PM5/17/18

to tesseract-ocr

[SOLVED] changed the language from 'hin+eng' to 'hin'
In this case selection of language also matters - I was processing image with lang=hin+eng, but it was giving the same error (mentioned in this post)

As English text was less in the image so I changed lang=hin and I got the expected result.

public static void main(String[] args) {
        Tesseract in = new ReadImageText().getTesseractInstance("C:/Program Files (x86)/Tesseract-OCR/tessdata/", "hin");
        try {
            String resultText = in.doOCR(new File("C:/EA/app-result/im/01-001/34/0.png"));
            log.info("resultText {}", resultText);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

shree

unread,

May 18, 2018, 1:32:59 AM5/18/18

to tesseract-ocr

It is possible that you have not downloaded eng.traineddata or it is in a different location.

Try running tesseract on command line, check --list-langs.

Dattatraya Tembare

unread,

May 19, 2018, 7:48:33 PM5/19/18

to tesseract-ocr

I have eng tessdata, it worked for previous 33 files and failed for 34th.
When I tried command like it worked. But in command line I have not specified tessdata path.my assumption is it is picking default path.

Reply all

Reply to author

Forward