TESSDATA_PREFIX appears to be ignored

6,077 views
Skip to first unread message

cga...@gmail.com

unread,
Apr 24, 2014, 10:39:30 AM4/24/14
to tesser...@googlegroups.com
Hi,

I have installed the tesseract-ocr (v3.03) package within Ubuntu 12.04.3 LTS and am experiencing difficulty getting it to run.

I first located the tessdata directory:

$ sudo find / -name tessdata
/usr/share/tesseract-ocr/tessdata

Then set the environment variable TESSDATA_PREFIX to tessdata's parent directory:

$ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/

And then checked to make sure this was set correctly:

$ echo $TESSDATA_PREFIX
/usr/share/tesseract-ocr/

When trying to run tesseract against a test image I receive the following error:
$ tesseract test.png out
Error opening data file tesseract/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Tesseract Open Source OCR Engine v3.03 with Leptonica
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)


The first line of the error suggests that tesseract is not taking the value from TESSDATA_PREFIX and is instead trying to locate the language files in 'tesseract/tessdata'.

Can anyone advise where I am going wrong?

Thanks,
Chris

Nick White

unread,
Apr 24, 2014, 11:48:34 AM4/24/14
to tesser...@googlegroups.com
Hi Chris,

That is strange; it sounds like you did everything right.

How did you install the package? I see that 12.04.3 has the 3.02.01
version. Did you compile Tesseract from source, or use a package
from a newer version of Ubuntu?

One other thing to check; do you have the english training data
installed? That will be a separate package; tesseract-ocr-eng. Also
can you double check that the file
/usr/share/tesseract-ocr/tessdata/eng.traineddata exists?

Thanks

Nick

zdenop

unread,
Apr 24, 2014, 4:44:38 PM4/24/14
to tesser...@googlegroups.com


On Thursday, 24 April 2014 16:39:30 UTC+2, cga...@gmail.com wrote:
Hi,

I have installed the tesseract-ocr (v3.03) package within Ubuntu 12.04.3 LTS and am experiencing difficulty getting it to run.

How did you installed it?
 
I first located the tessdata directory:

$ sudo find / -name tessdata
/usr/share/tesseract-ocr/tessdata

Then set the environment variable TESSDATA_PREFIX to tessdata's parent directory:

$ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/

And then checked to make sure this was set correctly:

$ echo $TESSDATA_PREFIX
/usr/share/tesseract-ocr/

When trying to run tesseract against a test image I receive the following error:
$ tesseract test.png out
Error opening data file tesseract/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Tesseract Open Source OCR Engine v3.03 with Leptonica
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)


When I run 'tesseract  eurotext.tif - -l xz' (requesting not existing language file) I got different output and there is not segfault. The same happens if I set TESSDATA_PREFIX to wrong directory (e.g. /tmp) and I run 'tesseract  eurotext.tif -'. So it looks like you have strange installation....

cga...@gmail.com

unread,
Apr 25, 2014, 5:37:54 AM4/25/14
to tesser...@googlegroups.com
Thanks Nick and zdentop, your comments were very helpful and prompted me to carefully consider exactly where I may have gone wrong during the install.

Initially I compiled and installed tesseract from source but after having some issues I removed the installed files and installed using apt. Unfortunately instead of completely removing the initial installation it seems I had instead butchered it and created a crippled v3.03 install.

Having spent some time this morning cleaning out all tesseract related files/folders and reinstalling with apt and I'm pleased to report that tesseract is now working as expected.

I appreciate you both taking the time to reply and help me resolve this.

Regards,
Chris

Nick White

unread,
Apr 25, 2014, 10:33:18 AM4/25/14
to tesser...@googlegroups.com
On Fri, Apr 25, 2014 at 02:37:54AM -0700, cga...@gmail.com wrote:
> Having spent some time this morning cleaning out all tesseract related files/
> folders and reinstalling with apt and I'm pleased to report that tesseract is
> now working as expected.

Great, glad to hear it Chris, I hope you get on well with Tesseract :)

Nick

Neil Stewart

unread,
Feb 5, 2020, 1:22:53 PM2/5/20
to tesseract-ocr
I had the same problem and was scratching my head at how to solve it and then realised that the answer was there all the while:

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

Just had to copy the eng.traineddata osd.traineddata files into the parent directory i.e. one level up

In my case these files were in this location: /usr/share/tesseract-ocr/4.00/tessdata/
Starting from that location I copied them to here: /usr/share/tesseract-ocr/4.00/

sudo cp -i eng.traineddata osd.traineddata /usr/share/tesseract-ocr/4.00/

After this the OCR worked

Reply all
Reply to author
Forward
0 new messages