Beginner question : could not initialize tesseract, missing eng.traineddata file in tessdata

8,143 views
Skip to first unread message

Roparzh Hemon

unread,
Jan 16, 2021, 11:59:19 AM1/16/21
to tesseract-ocr

Hello,

 I am a complete beginner to Tesseract. I just installed it on my Ubuntu machine.
Here is a snippet from my Terminal : 

$ echo TESSDATA_PREFIX
/home/mbalambala/tesseract/tessdata
$ tesseract Downloads/p1.pdf p1
Error opening data file /home/mbalambala/tesseract/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
$ ls /home/mbalambala/tesseract/tessdata
configs                    eng.user-words Makefile.am pdf.tiff 
eng.user-patterns Makefile              Makefile.in   tessconfigs 



So it seems I need to produce a eng.traineddate file in my tessdata directory, how do I do this ?



Adriana Camilleri

unread,
Jan 17, 2021, 4:37:22 AM1/17/21
to tesser...@googlegroups.com
Run the following command in order to get the eng.traineddata file within the tessdata directory: wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fa3fd4fb-fb96-4420-8bc0-69e1e4e3798fn%40googlegroups.com.

Roparzh Hemon

unread,
Jan 19, 2021, 11:19:09 AM1/19/21
to tesseract-ocr

I downloaded it as you suggested, and as the terminal output below shows, the file is now present at the correct place :

$file /home/mbalambala/tesseract/tessdata/eng.traineddata
/home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, UTF-8 Unicode text, with very long lines

$ echo TESSDATA_PREFIX
/home/mbalambala/tesseract/tessdata

but the error message stays exactly the same :

$ tesseract Downloads/p1.pdf p1
Error opening data file /home/mbalambala/tesseract/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.


Whatever the real problem is, the error message is not detecting it.

Shree Devi Kumar

unread,
Jan 19, 2021, 11:30:46 AM1/19/21
to tesseract-ocr

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Adriana Camilleri

unread,
Jan 19, 2021, 12:18:17 PM1/19/21
to tesseract-ocr
My apologies... hope the error is now fixed.

Roparzh Hemon

unread,
Jan 19, 2021, 12:43:53 PM1/19/21
to tesseract-ocr
shree : your solution worked for me, thanks a lot.
Reply all
Reply to author
Forward
0 new messages