Terrible results from Tesseract API

613 views
Skip to first unread message

elena bresciani

unread,
Jul 3, 2014, 6:22:50 AM7/3/14
to tesser...@googlegroups.com
Dear all,

I need to integrate Tesseract in a C++ project.
First I simply called Tesseract from command line and, after setting up a spefic configuration I've come to satifying results.

This is the config file "pharma"

load_system_dawg 0
load_freq_dawg 0
load_punc_dawg    0
user_words_suffix pharma-words
tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,
language_model_penalty_non_dict_word 0


Now that I have to do the same thing with a Tesseract API I have terrible results, like down to 10% of correct identification and 90% garbage.
I must be missing something in the conversion to the API...

This is my code

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main(int argc, char *argv[])
{
    char *outText;
   
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
   
    api -> Init("/usr/local/share/","ita");
    api -> ReadConfigFile ("pharma");
   
   
    Pix *image = pixRead (argv[1]);  
    api -> SetImage (image);
    api -> SetSourceResolution(600);
   
    outText = api -> GetUTF8Text();
    printf ("OCR output: \n%s", outText);
   
    api -> End();
    delete [] outText;
    pixDestroy (&image);
   
    return 0;
  
}


Can somebody help me undestand please?
 
Thanks in advance

Elena

Nick White

unread,
Jul 3, 2014, 5:37:37 PM7/3/14
to tesser...@googlegroups.com
Hi Elena,

Just a guess, but maybe this line:

> api -> SetSourceResolution(600);

is the source of your troubles? Tesseract from the command line
would have just been guessing it, and perhaps its guess, coupled
with its ideas about different sizes of fonts, were better than
yours?

Nick

zdenko podobny

unread,
Jul 4, 2014, 3:22:09 AM7/4/14
to tesser...@googlegroups.com
Could you please post also testing image?

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elena bresciani

unread,
Jul 4, 2014, 3:24:59 AM7/4/14
to tesser...@googlegroups.com
I tried to remove that line but the results are still really bad.
The problem must be somewhere else..

other suggestions? :)

Elena



--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/bHlfRbg4Fhs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.

elena bresciani

unread,
Jul 4, 2014, 3:39:52 AM7/4/14
to tesser...@googlegroups.com
Here's an example of the kind of text that I have to read
12OUT_2 (copia)6.jpg

zdenko podobny

unread,
Jul 4, 2014, 5:01:18 AM7/4/14
to tesser...@googlegroups.com
I see the problem (there may be also something else ;-) as I do not have time to test it yet):
load_system_dawg, load_freq_dawg etc. are init parameters[1] - you try it set them later they are ignored
You need to pass them to init (see section Tesseract-OCR API[2])


Zdenko


elena bresciani

unread,
Jul 4, 2014, 6:53:58 AM7/4/14
to tesser...@googlegroups.com
I'm trying modify the code like you said but now I have problems with GenricVector.

I have included
<tesseract/genericvector.h>

and then wrote my code as in the example you cited

GenericVector pars_vec;
..
GenericVector pars_values;
..

But it doesn't compile and I get this error:

OCR-0.1.cpp: In function ‘int main(int, char**)’:
OCR-0.1.cpp:9:19: error: missing template arguments before ‘pars_vec’
     GenericVector pars_vec;
                   ^
OCR-0.1.cpp:9:19: error: expected ‘;’ before ‘pars_vec’
OCR-0.1.cpp:10:5: error: ‘pars_vec’ was not declared in this scope
     pars_vec.push_back("load_system_dawg");
     ^
OCR-0.1.cpp:15:19: error: missing template arguments before ‘pars_values’
     GenericVector pars_values;
                   ^
OCR-0.1.cpp:15:19: error: expected ‘;’ before ‘pars_values’
OCR-0.1.cpp:16:5: error: ‘pars_values’ was not declared in this scope
     pars_values.push_back("F");
     ^


what am I missing again?


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/bHlfRbg4Fhs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.

zdenko podobny

unread,
Jul 4, 2014, 8:44:07 AM7/4/14
to tesser...@googlegroups.com
what is your version of tesseract?
what kind and version of compiler you use?

Zdenko


elena bresciani

unread,
Jul 4, 2014, 8:50:48 AM7/4/14
to tesser...@googlegroups.com
I'm using tesseract 3.02.02
 and compiling with gcc 4.8.2


zdenko podobny

unread,
Jul 4, 2014, 6:06:10 PM7/4/14
to tesser...@googlegroups.com
check this[1] - it works for me on openSUSE 13.1 64bit with tesseract 3.02.02 and gcc 4.8.1 (even the result is not the same as for command line ;-) ;


Zdenko


elena bresciani

unread,
Jul 7, 2014, 9:38:17 AM7/7/14
to tesser...@googlegroups.com
Thank you very much!
Now for me it works exactly like from terminal.

Cheers,
Elena


elena bresciani

unread,
Jul 7, 2014, 10:47:52 AM7/7/14
to tesser...@googlegroups.com
I have one more question:

 how can I use TessResultRenderer to write output in a txt or in an hOCR file?
Reply all
Reply to author
Forward
0 new messages