As I understand tessdata-prefix as well as the tessdata-dir option that you are proposing now specify where tesseract-ocr should look for its data (languages and configs) - does it also - without explicitly mentioning it - specify where the 'tesseract.exe' resides?
The reason I ask is this, I had initially downloaded the 3.02 windows package and installed it. Then I downlaoded the latest svn thru VS2008 and compiled it. They are in two different locations.
So, is setting the tessdata-prefix or this new option enough enough for making sure that the correct tesseract executable is used?
This is not about "correct" tesseract executable. This is about "correct" tesseract-ocr data.
Let assume this scenario:
You have installed last official tesseract version in C:\Program Files\Tesseract-OCR\ (so your tessdata_prefix environment variable points to that directory)
You create alternative eng.traineddata file (e.g. without dictionaries) and you placed it to C:\Program Files\Tesseract-OCR-dev\tessdata.
With current executable if you want to use your alternative eng data you must modify tessdata_prefix environment variable to C:\Program Files\Tesseract-OCR-dev\. Than you call:
tesseract eurotext.tif stdout -l eng
When you want to use "official" data you need to change tessdata_prefix environment variable back to C:\Program Files\Tesseract-OCR\.
Proposed patch make it simpler. You do not need to modify tessdata_prefix environment variable. You can just use option "--tessdata-dir" (that will have higher priority that environment variable):
tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR-dev\tessdata" eurotext.tif stdout -l eng
Of course patch also modify behavior of the tesseract-ocr library regarding priority of tessdata_prefix - at the moment environment variable has top priority. Patch change it so the argument has top priority (for more detail see comments in patch for ccutil/mainblk.cpp).
PS1: maybe instead of "--tessdata-dir" we can use "--tessdata_prefix" to be consistent
--
Zdenko