Re: tesseract api and user-words

3,834 views
Skip to first unread message

zdenko podobny

unread,
Nov 30, 2012, 3:56:22 AM11/30/12
to tesser...@googlegroups.com
I guess there is problem to find deu.traineddata.

I would suggest to run your program in console, so you can see possible error message (something like "Error opening data file C:\Program Files\Tesseract-OCR\tessdata/deu.traineddata").

Another option is to  init tesseract and set variables in more steps to check for errors. Something like this:

    const char* configs = "myconfig";
    TessBaseAPI *tess = new TessBaseAPI();
    if (tess->Init(NULL, "deu", OEM_DEFAULT)) {
      fprintf(stderr, "Could not initialize tesseract.\n");
      exit(1);
    }
  
    // write messages to tesseract.log instead of stderr...
    if (!tess->SetVariable("debug_file", "tesseract.log")) {
      fprintf(stderr, "Could not set variable 'debug_file'.\n");
    }
    tess->ReadConfigFile(configs);


-- 
Zdenko

On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <mhil...@gmail.com> wrote:
Hi,

I am trying to include a custom word directory with a custom configuration file and the user_words_suffix property.
My code looks like this:

TessBaseAPI tess;
char *configs[]={"myconfig"};
int configs_size = 1;
tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, false );

My config file looks like this:

user_words_suffix user-words

The Problem is that my program exits with code 1 after the init call.
I tried both a simple deu.user-words file with one word in every line and also converted the file into a dawg file. Nothing worked.
If I remove the user_words_suffix line in the config file everything works.

I am using Tesseract 3.02, Windows 8 and Visual Studio 2012.

I would really appreciate some help.




--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

John Williams

unread,
Nov 30, 2012, 8:44:25 AM11/30/12
to tesser...@googlegroups.com
Not sure if this will help you or not, but setting the datapath in the Init function didn't work for me. I had to set it in the environment variable TESSDATA_PREFIX before initialization and everything worked fine. You might give that a shot...

On Fri, Nov 30, 2012 at 7:04 AM, Matthias Hillert <mhil...@gmail.com> wrote:
i tried running the program in the console and did get the following error message:

Could not open file, C:\tesseract-3.02\tessdata/deu.user-words

The file is definitely there. Maybe it has something to do with the different slashes?
Is the user-words file supposed to be a dawg file or a simple text file with one word per line?

I also tried settings the datapath of the Init function to "C:/tesseract-3.02/" to get the right slashes but i got the same result.

Regarding you option to set the config file after the init call, i read here http://code.google.com/p/tesseract-ocr/wiki/ControlParams
that you can only set the user_words_suffix param in the init call. Is this correct?

zdenko podobny

unread,
Nov 30, 2012, 1:23:22 PM11/30/12
to tesser...@googlegroups.com
if environment variable is set then it has priority than setting datapath in init.
If datapath was build-in (default with autotools) that setting datapath in init has no effect too. 

Search archive of this forum for more info.

-- 
Zdenko

zdenko podobny

unread,
Nov 30, 2012, 2:08:19 PM11/30/12
to tesser...@googlegroups.com

    Pix *image;
    char *outText; 
    char *configs[]={"myconfig"};
    int configs_size = 1;

    TessBaseAPI *tess = new TessBaseAPI();
    if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, false)) {
      fprintf(stderr, "Could not initialize tesseract.\n");
      exit(1);
    }

    image = pixRead("C:\\tesseract-3.02\\phototest.tif");
    tess->SetImage(image); 
    outText = tess->GetUTF8Text();
    fprintf(stdout, outText); 

and it works for me (VC++ 2008 on Windows XP). I have this in C:\tesseract-3.02:

C:\tesseract-3.02\phototest.tif
C:\tesseract-3.02\tessdata\deu.traineddata
C:\tesseract-3.02\tessdata\deu.user-words
C:\tesseract-3.02\tessdata\configs\myconfig

And deu.user-words effects results of ocr (I have there words like all, lazy etc.)
Below are some inline comments.
-- 
Zdenko

On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <mhil...@gmail.com> wrote:
i tried running the program in the console and did get the following error message:

Could not open file, C:\tesseract-3.02\tessdata/deu.user-words

The file is definitely there. Maybe it has something to do with the different slashes?
 
Windows handle slash correctly (e.g. as directory separator). So problem should be somewhere else. Are you able to open that path with fopen?
 
Is the user-words file supposed to be a dawg file or a simple text file with one word per line?
 
One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) )


I also tried settings the datapath of the Init function to "C:/tesseract-3.02/" to get the right slashes but i got the same result.

Check if there is set environment settings (echo %TESSDATA_PREFIX%).


Regarding you option to set the config file after the init call, i read here http://code.google.com/p/tesseract-ocr/wiki/ControlParams
that you can only set the user_words_suffix param in the init call. Is this correct?


Yes it is correct. But if there is problem I prefer to do things step by step (e.g. you can try set "init only" parameters after init, but it will not cause error - just they will effect nothing). 
 

Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop:

zdenko podobny

unread,
Dec 7, 2012, 11:14:24 AM12/7/12
to tesser...@googlegroups.com
Can you please try to use tesseract-ocr-3.02-win32-portable.zip?

I tried this on Win7 and it works for me:

c:\tesseract-ocr\vs2008> set TESSDATA_PREFIX=c:\tesseract-ocr\

c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest -l deu
Tesseract Open Source OCR Engine v3.02 with Leptonica

c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-user -l deu config_file
Tesseract Open Source OCR Engine v3.02 with Leptonica

c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-x -l deux
Error opening data file c:\tesseract-ocr\tessdata/deux.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'deux'
Tesseract couldn't load any languages!
Could not initialize tesseract.

c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-user -l spa config_file
Error opening data file c:\tesseract-ocr\tessdata/spa.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'spa'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Zdenko



On Fri, Dec 7, 2012 at 11:34 AM, Matthias Hillert <mhil...@gmail.com> wrote:
I tried your code and it did not work. I get the error message "Could not open file, C:\tesseract-3.02\tessdata/deu.user-words".
I then tried to open the file with fopen. It did not work for the path 

C:\tesseract-3.02\tessdata/deu.user-words

But it worked for the following paths:

C:\\tesseract-3.02\\tessdata\\deu.user-words
C:/tesseract-3.02/tessdata/deu.user-words
C:\\tesseract-3.02\\tessdata/deu.user-words

echo %TESSDATA_PREFIX% yields

C:\tesseract-3.02\

I changed this setting manually to C:/tesseract-3.02/ and now i get the error message "Could not open file, C:/tesseract-3.02/tessdata/deu.user-words".
I even removed the setting completely so it uses the path supplied with the Init call. Still no luck, same error.

Anymore suggestions?

葉家忠

unread,
May 7, 2014, 4:55:27 AM5/7/14
to tesser...@googlegroups.com
I found that the config file with "user_words_suffix user-words" in it, you have to make sure that there is no blank line next to it, otherwise you will get the " Can't open xxx " error message, 
In all, you have to notice that the format of the config file, I spend a lot of time here to figure this error out....


2012년 12월 7일 금요일 오후 6시 34분 30초 UTC+8, Matthias Hillert 님의 말:
I tried your code and it did not work. I get the error message "Could not open file, C:\tesseract-3.02\tessdata/deu.user-words".
I then tried to open the file with fopen. It did not work for the path 

C:\tesseract-3.02\tessdata/deu.user-words

But it worked for the following paths:

C:\\tesseract-3.02\\tessdata\\deu.user-words
C:/tesseract-3.02/tessdata/deu.user-words
C:\\tesseract-3.02\\tessdata/deu.user-words

echo %TESSDATA_PREFIX% yields

C:\tesseract-3.02\

I changed this setting manually to C:/tesseract-3.02/ and now i get the error message "Could not open file, C:/tesseract-3.02/tessdata/deu.user-words".
I even removed the setting completely so it uses the path supplied with the Init call. Still no luck, same error.

Anymore suggestions?



Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop:

Rajil Yadav

unread,
Jan 23, 2016, 1:00:06 PM1/23/16
to tesseract-ocr
Hi,

 I am trying to do winth win8, s 2012, no environment variable is set.


api = new tesseract::TessBaseAPI();
int i = api->Init("C:\\tesseract-ocr-3.02\\tessdata\\","hin",tesseract::OEM_DEFAULT);
if(i)
{
MessageBox(L"Could not initialize tesseract.", L"Illusssion",MB_OK);
        exit(1);


Every time  I compile it comes with -1 result

zdenko podobny

unread,
Jan 23, 2016, 1:10:26 PM1/23/16
to tesser...@googlegroups.com
try to use "C:\\tesseract-ocr-3.02\\" in init...

Zdenko

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7b23d0a8-7cb2-4139-9877-9c57bf7c000a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages