tesseract does not work. Some errors displayed

190 views
Skip to first unread message

adrian company

unread,
Oct 4, 2013, 2:59:34 AM10/4/13
to tesser...@googlegroups.com
Hi all,
I am using tesseract engine to detect text from an image. I process the image to binarize it and then extract text from it, but some errors are displayed when I execute the program.
Does anyone know what I am doing wrong? I paste the code and errors displayed in the execution.

if (waitKey(10) >= 0){

               
// leer imagen
               
Mat imagen = imread("/home/adrian/workspace/OCR/matricula2.jpg"/*,CV_LOAD_IMAGE_GRAYSCALE*/);
                imshow
("imagen",imagen);
             

               
//procesamos imagen redimensionada: (filtramos, pasamos a escala grises, binarizamos)
                 medianBlur
(imagen,imagen, 3);
                 cvtColor
(imagen,imagen,CV_BGR2GRAY);
                 threshold
(imagen,imagen,umbral, umbral_max,3);

               
// inicializamos motor OCR tesseract
                   putenv
("TESSDATA_PREFIX=/usr/local/share/");
                   setlocale
(LC_NUMERIC, "C");
                   tesseract
::TessBaseAPI api;
                   printf
("\nTesseract-ocr version: %s--------\t",api.Version()); //version de tesseract
                   printf
("Leptonica version: %s\n", getLeptonicaVersion());        //version de leptonica
                   printf
("___________________________________________________________________________\n");

                   
if (api.Init(NULL, "spa")) {                 //idioma spanish
                      fprintf
( stderr, " ¡No se pudo inicializar tesseract! \n" );
                       
exit(1);
                   
}
                 
                    api
.SetPageSegMode(tesseract::PSM_AUTO);
                    api
.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789");
                                   
                    api
.SetImage(imagen.data, imagen.size().width,imagen.size().height, imagen.channels(), imagen.step1());

           
// region de interes (ROI), p.ej. regiones que contengan texto
           
Rect textROI(0,0,imagen.cols,imagen.rows);//imagen completa


           
// recognize text
            api
.TesseractRect( imagen.data, 0,imagen.step1(), textROI.x, textROI.y,textROI.width, textROI.height);

           
char *texto = new char[200];
            texto
= api.GetUTF8Text();
           
// remove "newline"
           
string t1(texto);
            t1
.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );

           
// print found text
            printf
("TEXTO LEIDO: \n");
            printf
( "%s",t1.c_str() );

       

           
// draw rectangle image
           
            rectangle
(imagen, textROI, Scalar(0, 0, 255), 2, 8, 0);
           
            imwrite
("/home/adrian/workspace/OCR/procesadas/binaria.jpg",imagen);
           
            imshow
("binarizada",imagen);

           
delete [] texto;
           
// destroy tesseract OCR engine
            api
.Clear();
            api
.End();
           
}

and the errors displayed are:

Tesseract-ocr version: 3.02.02--------    Leptonica version: leptonica-1.69
___________________________________________________________________________
Error in pixReduceRankBinary2: hs must be at least 2
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs1 not defined
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs2 not defined
TEXTO LEIDO:





Message has been deleted

adrian company

unread,
Oct 4, 2013, 3:04:52 AM10/4/13
to tesser...@googlegroups.com
When no errors are displayed, I read no text from the image. The output is empty.
Please any clue about that??

adrian company

unread,
Oct 4, 2013, 6:53:29 AM10/4/13
to tesser...@googlegroups.com

Hi I' ve been trying to fix the code and now I read some text but the text readed is not correct. When I' m trying to read the image, "texto.jpg" the output text is something like that:
II" n llfi  ' IIIi"'"“MM“ufimn"w»“W,;H»m““wu%M%“m»%.IlI!!!!!| "" "" ''"umm """"'"""“ "" "" '''''|IImmw '!!!ll::...i!!"!""' W*WM‘WM

Anybody  knows what is the reason of that?
This is the image to read:


El viernes, 4 de octubre de 2013 08:59:34 UTC+2, adrian company escribió:

Dr Georgle

unread,
Nov 3, 2017, 6:29:16 AM11/3/17
to tesseract-ocr
did you ever get any help with this?  your image is not of good quality, but i could extract a lot of the text without much effort.

Reply all
Reply to author
Forward
0 new messages