Re: The way the path to tessdata directory is defined.

143 views
Skip to first unread message

zdenko podobny

unread,
Jul 14, 2013, 4:56:22 PM7/14/13
to tesser...@googlegroups.com, tesser...@googlegroups.com
I play a little bit with Dmitry Katsubo patch. Based on it I suggest to implement option "--tessdata-dir" for tesseract-ocr executable. It allows user to specify where tesseract-ocr should look for its data (languages and configs). For example something like this should work after applying my patch from issue 938[1] :
    tesseract --tessdata-dir /usr/src/tesseract-ocr/tessdata eurotext.tif stdout

Feel free to test and comment it.

[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=938#c4

Zdenko


On Fri, Jun 7, 2013 at 12:54 PM, Dmitry Katsubo <dmitry....@gmail.com> wrote:
On 07.06.2013 11:35, Sriranga(79yrs) wrote:
I would suggest  better to post under issues with your patch also. Presumed that it will work for windows platform also apart from Linux?

I have submitted issue#938.
I can't 100% guarantee that it works for WIndows, as I don't have means to compile Tesseract under Windows. However I didn't add something new, I have just reshuffled the order the values are checked, so it should work OK.

-- 
With best regards,
Dmitry

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

zdenko podobny

unread,
Jul 16, 2013, 5:04:12 PM7/16/13
to tesser...@googlegroups.com, tesser...@googlegroups.com
On Mon, Jul 15, 2013 at 6:07 AM, Shree Devi Kumar <shree...@gmail.com> wrote:
As I understand tessdata-prefix as well as the tessdata-dir option that you are proposing now specify where tesseract-ocr should look for its data (languages and configs) - does it also - without explicitly mentioning it - specify where the 'tesseract.exe' resides?

The reason I ask is this, I had initially downloaded the 3.02 windows package and installed it. Then I downlaoded the latest svn thru VS2008 and compiled it. They are in two different locations.

So, is setting the tessdata-prefix or this new option enough enough for making sure that the correct tesseract executable is used?

This is not about "correct" tesseract executable. This is about "correct" tesseract-ocr data.

Let assume this scenario:
You have installed last official tesseract version in C:\Program Files\Tesseract-OCR\ (so your tessdata_prefix environment variable points to that directory)
You create alternative eng.traineddata file (e.g. without dictionaries) and you placed it to C:\Program Files\Tesseract-OCR-dev\tessdata.

With current executable if you want to use your alternative eng data you must modify tessdata_prefix environment variable to C:\Program Files\Tesseract-OCR-dev\. Than you call:
     tesseract eurotext.tif stdout -l eng
When you want to use "official" data you need to change tessdata_prefix environment variable back to C:\Program Files\Tesseract-OCR\.

Proposed patch make it simpler. You do not need to modify tessdata_prefix environment variable. You can just use option "--tessdata-dir" (that will have higher priority that  environment variable):
     tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR-dev\tessdata" eurotext.tif stdout -l eng

Of course patch also modify behavior of the tesseract-ocr library regarding priority of tessdata_prefix - at the moment environment variable has top priority. Patch change it so the argument has top priority (for more detail see comments in patch for ccutil/mainblk.cpp).

PS1: maybe instead of "--tessdata-dir" we can use "--tessdata_prefix" to be consistent

--
Zdenko
Reply all
Reply to author
Forward
0 new messages