tesseract4.0 - Tesseract couldn't load any languages!

15,840 views
Skip to first unread message

Sasha Ostrikov

unread,
Nov 26, 2017, 10:16:14 AM11/26/17
to tesseract-ocr
Hi all,
I know the internet is full of fail loading languages problem, but I'm struggling with it for hours and can't find solution.
So this is my installation of tesseract:

[ds@lab1 images]$ tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.4
  libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0

 Found AVX
 Found SSE


I constantly get this error:
[ds@lab1 images]$ tesseract text2.png t1.txt -l eng --oem 0
Error opening data file /usr/local/share/tesseract-ocr/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
____________________________________________________________________________________________________

[ds@lab1 images]$ tesseract text2.png t1.txt -l eng --oem 1
Error opening data file /usr/local/share/tesseract-ocr/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

When I run list-langs, I get this, looks like it is able to find languages:
[ds@lab1 images]$ tesseract --list-langs
Error opening data file /usr/local/share/tesseract-ocr/tessdata4.0/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
List of available languages (4):
Hebrew
fra
heb
eng

And this is the my languages directory structure:
[ds@lab1 share]$ ll -r  tesseract-ocr/
total 144
drwxr-xr-x. 4 root root  4096 Nov 23 12:27 tessdata4.0
drwxr-xr-x. 4 root root    82 Nov 23 11:17 tessdata3.04
drwxr-xr-x. 4 root root  4096 Nov 23 09:38 tessdata
drwxr-xr-x. 2 root root    92 Nov 26 10:12 tessconfigs
-rwxr-xr-x. 1 root root   572 Nov 26 10:12 pdf.ttf
-rwxr-xr-x. 1 root root 30997 Nov 26 10:12 heb.traineddata
-rwxr-xr-x. 1 root root 31033 Nov 26 10:12 Hebrew.traineddata
-rwxr-xr-x. 1 root root 30997 Nov 26 10:12 fra.traineddata
-rwxr-xr-x. 1 root root 32288 Nov 26 10:12 eng.traineddata
drwxr-xr-x. 2 root root  4096 Nov 26 10:12 configs

I tried to set TESSDATA_PREFIX for different locations:
[ds@lab1 images]$ history | grep export
 1006  export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/
 1009  export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/tessdata
 1028  export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/tessdata4.0
 1033  export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/tessdata3.04/
 1037  export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/tessdata3.04

I tried to use all kinds of combinations of pathes, lang files and oem flags - same nasty error :(

Any help would be highly appreciated!

Z WANG

unread,
Dec 14, 2017, 1:49:29 AM12/14/17
to tesseract-ocr
Can you try to copy your eng.traineddata to /usr/local/share/tessdata and rerun your command line? Then report here again?

Sasha Ostrikov

unread,
Dec 20, 2017, 10:19:23 AM12/20/17
to tesseract-ocr
Yes tried that. this is the output - same sh*t :(

sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ ll /usr/local/share/tessdata/
total 148
drwxr-xr-x  4 root root  4096 Dec 20 15:15 ./
drwxr-xr-x 10 root root  4096 Dec 20 15:14 ../
drwxr-xr-x  2 root root  4096 Dec 20 15:15 configs/
-rwxr-xr-x  1 root root 32291 Dec 20 15:15 eng.traineddata*
-rwxr-xr-x  1 root root 32292 Dec 20 15:15 fra.traineddata*
-rwxr-xr-x  1 root root 32324 Dec 20 15:15 Hebrew.traineddata*
-rwxr-xr-x  1 root root 31001 Dec 20 15:15 heb.traineddata*
-rwxr-xr-x  1 root root   572 Dec 20 15:15 pdf.ttf*
drwxr-xr-x  2 root root  4096 Dec 20 15:15 tessconfigs/
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ $TESSDATA_PREFIX
bash: /usr/local/share/tessdata: Is a directory
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ 
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ tesseract text2.png t1.txt -l eng --oem 0
Error opening data file /usr/local/share/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ tesseract text2.png t1.txt -l eng --oem 1
Error opening data file /usr/local/share/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ tesseract text2.png t1.txt -l eng --oem 2
Error opening data file /usr/local/share/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
sasha@ds:~/dev/ext/dsnotebooks/text_extraction/images$ tesseract text2.png t1.txt -l eng --oem 3
Error opening data file /usr/local/share/eng.traineddata

ShreeDevi Kumar

unread,
Dec 22, 2017, 6:01:35 AM12/22/17
to tesser...@googlegroups.com
As per your message above
The files are in   /usr/local/share/tessdata/
but program is looking for them at /usr/local/share/

you can set TESSDATA_PREFIX and try

OR 

specify the directory as part of the command line. I have found that to be the easiest way, specially when using/comparing diff kinds of traineddata (fast, best, legacy ...)

example script below

-------

#!/bin/bash
img_files=$(ls ./Cap*.png)
for img_file in ${img_files}; do
  echo "****************************" ${img_file} "**********************************"
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata_best/   ${img_file} ${img_file%.*}-eng-best  --oem 1 --psm 6 -l eng
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata_fast/   ${img_file} ${img_file%.*}-eng-fast  --oem 1 --psm 6 -l eng
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata/   ${img_file} ${img_file%.*}-engplus  --oem 1 --psm 6 -l engplus
done





ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c11d3337-48dc-4d1d-a621-c903573ca76d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sasha Ostrikov

unread,
Dec 25, 2017, 5:21:27 AM12/25/17
to tesseract-ocr
Hi Shree,
I tried all options but nothing helps.
For example this is my traineddata folder:

sasha@ds:~/dev/data$ ll /usr/local/share/tesseract-ocr/tessdata4.0/
total 148
drwxr-xr-x 4 root root  4096 Nov 22 18:43 ./
drwxr-xr-x 4 root root  4096 Nov 26 10:29 ../
drwxr-xr-x 2 root root  4096 Nov 22 16:36 configs/
-rwxr-xr-x 1 root root 32291 Nov 22 18:43 eng.traineddata*
-rwxr-xr-x 1 root root 32292 Nov 22 18:43 fra.traineddata*
-rwxr-xr-x 1 root root 32324 Nov 22 18:42 Hebrew.traineddata*
-rwxr-xr-x 1 root root 31001 Nov 22 18:43 heb.traineddata*
-rwxr-xr-x 1 root root   572 Nov 22 16:36 pdf.ttf*
drwxr-xr-x 2 root root  4096 Nov 22 16:36 tessconfigs/

trained data was downloaded from here: https://github.com/tesseract-ocr/tessdata_best

when I run this command this is what I get:
sasha@ds:~/dev/data$ tesseract --tessdata-dir /usr/local/share/tesseract-ocr/tessdata4.0  text2.png text2.txt  --oem 3 --psm 6 -l eng
Error opening data file /usr/local/share/tesseract-ocr/tessdata4.0/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

I'm running on ubuntu 16 and this is my tesseract -v
tesseract 4.00.00dev-694-gebbfc3a
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.25 : libtiff 4.0.6 : zlib 1.2.8

 Found AVX
 Found SSE

:(
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Dec 25, 2017, 6:16:38 AM12/25/17
to tesser...@googlegroups.com
Looks like the traineddata files are corrupted or did not download ok and hence you are getting file not found issues.

Check the file sizes - you may want to download using wget or curl.

root@All-in-1-Touch:/mnt/c/Users/User/shree/tessdata_best# ll *.traineddata
-rwxrwxrwx 1 root root  13077423 Sep 25 13:56 chi_sim.traineddata*
-rwxrwxrwx 1 root root  15400601 Sep 15 08:53 eng.traineddata*
-rwxrwxrwx 1 root root 101402885 Sep 16 18:26 Latin.traineddata*
-rwxrwxrwx 1 root root   7418529 Sep 15 09:05 osd.traineddata*


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sasha Ostrikov

unread,
Dec 25, 2017, 7:03:08 AM12/25/17
to tesseract-ocr
Wow!!! Shree you are the KING!!!! As you told - files were corrupted...I used wget all the time but the links I used were copied incorrectly from github. 
Now I clicked on files and copied link from Download button and that worked like a charm!!!!
Thank you so much my friend!!!! :)

Munjal Dhamecha

unread,
Dec 28, 2017, 6:42:38 AM12/28/17
to tesseract-ocr
Hi Sasha,

I am facing same problem but unable to find solution.

Would you please help share exact steps you followed that resolved the issue?

Sasha Ostrikov

unread,
Dec 28, 2017, 11:10:48 AM12/28/17
to tesseract-ocr
Munjal,
I my case the problem was the wrong traineddata files. The problem caused by incorrect way of downloading it from GitHub. 
The correct way is to click on file and enter the  page (e.g. https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata).
Inside the page right click on "Download" button and "Copy link address", so you"ll have the link in the clipboard (e.g. https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata).
Then just wget the copied link.
In the end ensure that downloaded files are of the same size as in github, so you"ll be sure files are ok.

rohit humne

unread,
Aug 6, 2024, 5:41:38 AM8/6/24
to tesseract-ocr
hi, frieds
I have same issue with japanese language 
this is what i am trying

humnerohit@humnes-MacBook-Pro tessdata % export TESSDATA_PREFIX=/usr/local/share/tessdata/

humnerohit@humnes-MacBook-Pro tessdata % tesseract --list-langs                           

List of available languages in "/usr/local/share/tessdata/" (5):

eng

jpn

jpn_vert

osd

snum

but when i execute the following command "tesseract test1.jpg result -l jpn_vert " it shows following error

Error opening data file /usr/local/share/tessdata/jpn_vert.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Failed loading language 'jpn_vert'

Tesseract couldn't load any languages!

Could not initialize tesseract.


can anyone please help me?


To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

rohit humne

unread,
Aug 6, 2024, 10:14:00 PM8/6/24
to tesseract-ocr

I tried with following combinations

"export TESSDATA_PREFIX=/usr/local/Cellar/tesseract/5.4.1/share/tessdata/"

"export TESSDATA_PREFIX=/usr/local/share/tessdata/ "

export TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata/

export TESSDATA_PREFIX=/usr/local/share/tessdata/

export TESSDATA_PREFIX=/usr/local/share/tesseract-ocr/tessdata/


still same issue. 

the code you share i am not getting it properly, can you please explain it whot to and how to do?

#!/bin/bash
img_files=$(ls ./Cap*.png)
for img_file in ${img_files}; do
  echo "****************************" ${img_file} "**********************************"
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata_best/   ${img_file} ${img_file%.*}-eng-best  --oem 1 --psm 6 -l eng
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata_fast/   ${img_file} ${img_file%.*}-eng-fast  --oem 1 --psm 6 -l eng
    time tesseract --tessdata-dir /mnt/c/Users/User/shree/tessdata/   ${img_file} ${img_file%.*}-engplus  --oem 1 --psm 6 -l engplus
done

this is what i am not getting properly.

Reply all
Reply to author
Forward
0 new messages