OCR Tool Not finding Tesseract language pack

1,570 views
Skip to first unread message

Daniel Lerch

unread,
Sep 9, 2013, 3:48:18 PM9/9/13
to isla...@googlegroups.com
I upgraded tesseract to 3.02.02 per Islandora 7 requirements. It's installed to usr/local/bin/tesseract. I also sym linked it to /usr/bin. Oddly enough, Islandora 6 recognizes tesseract, but for whatever reason 7 does not. I also tried adding environment variables and changing permissions on the language file to no avail (yeah I got desperate). Tesseract also works fine on the command line. Screenshots attached. BTW - I'm using the stable version of islandora 7. 
tesseract_command_line.jpg
ocr_config.jpg

Kevin S. Clarke

unread,
Sep 9, 2013, 4:24:42 PM9/9/13
to isla...@googlegroups.com
Is it possible you have the newer tesseract version installed at /usr/local/share/tessdata and the older version at /usr/local/bin/ ?

Do you get anything when you type: /usr/local/share/tessdata/tesseract -v (you could also try to configure that path in the Islandora interface seen in the image with the white background).

Kevin



On Mon, Sep 9, 2013 at 3:48 PM, Daniel Lerch <dan....@gmail.com> wrote:
I upgraded tesseract to 3.02.02 per Islandora 7 requirements. It's installed to usr/local/bin/tesseract. I also sym linked it to /usr/bin. Oddly enough, Islandora 6 recognizes tesseract, but for whatever reason 7 does not. I also tried adding environment variables and changing permissions on the language file to no avail (yeah I got desperate). Tesseract also works fine on the command line. Screenshots attached. BTW - I'm using the stable version of islandora 7. 

--
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gervais de Montbrun

unread,
Sep 9, 2013, 7:22:16 PM9/9/13
to isla...@googlegroups.com
Hey Daniel,

What's the output when you do "which tesseract" on the command line? What about "tesseract --list-langs" ?
What OS are you running this on? 
What do you mean by "the stable version of islandora 7"?
-> If you mean, 7.x-1.1, I'd try moving up to more recent versions (from github).

Cheers,
Gervais

On 2013-09-09, at 4:48 PM, Daniel Lerch <dan....@gmail.com> wrote:

I upgraded tesseract to 3.02.02 per Islandora 7 requirements. It's installed to usr/local/bin/tesseract. I also sym linked it to /usr/bin. Oddly enough, Islandora 6 recognizes tesseract, but for whatever reason 7 does not. I also tried adding environment variables and changing permissions on the language file to no avail (yeah I got desperate). Tesseract also works fine on the command line. Screenshots attached. BTW - I'm using the stable version of islandora 7. 

--
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
<tesseract_command_line.jpg><ocr_config.jpg>

Daniel Lerch

unread,
Sep 16, 2013, 3:16:05 PM9/16/13
to isla...@googlegroups.com
Fixed. The issue turned out to be permissions on the language pack files. They needed to be set to 755. By default when extracted they were 644. I thought I had done this before but apparently not. 

Chen Conan

unread,
Sep 24, 2013, 12:17:19 AM9/24/13
to isla...@googlegroups.com
I still have the same problem for Chinese Simplified language pack on unbuntu 12.04
1.  Follow the instructions to install tesseract, http://miphol.com/muse/2013/05/install-tesseract-ocr-on-ubunt.html 
2.  make sure the language pack permission is 755
     conan@ubuntu:~$ ll /usr/local/share/tessdata/
     -rwxr-xr-x 1 root root 39973777 Sep 23 20:36 chi_sim.traineddata*
3.  try to recognize a Chinese Simplified character
     conan@ubuntu:~$ tesseract ma.png out -l chi_sim
     Error opening data file /usr/local/share/tessdata/eng.traineddata
     Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
     Failed loading language 'eng'
     Tesseract couldn't load any languages!
     Could not initialize tesseract.
4. but TESSDATA_PREFIX has the correct path
    conan@ubuntu:~$ env |grep TESSDATA_PREFIX
    TESSDATA_PREFIX=/usr/local/share
5. and tesseract version
    conan@ubuntu:~$ tesseract -v
    tesseract 3.02.02
    leptonica-1.69
    libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4

what is the problem? 


在 2013年9月17日星期二UTC+8上午3时16分05秒,Daniel Lerch写道:
Reply all
Reply to author
Forward
0 new messages