tesseract does not recognize text correctly or does not recognize.

8,970 views
Skip to first unread message

adrian company

unread,
Oct 30, 2013, 6:50:26 AM10/30/13
to tesser...@googlegroups.com
Hi all, I am trying to write a software to recognize some text from an image, but when I binarize the image and I call to tesseract engine, this does not recognize text in image. Does somebody know why text it is not recognized? Must I do something extra to recognize?
 I attach the image I am trying to recognize text (license plate). In this attached image the tesseract output is nothing.

I've also tried to recognize text from another image (Fuma) and in this case the output is: "L I".

Could anybody help me?

What could be happening?


Thanks in advance.
Adri



binaria.jpg
fuma.jpg

Sven Pedersen

unread,
Oct 30, 2013, 10:21:58 AM10/30/13
to tesser...@googlegroups.com
In the first image you need to deskew it. There are free programs for preparing the image, The second image appears to be too low resolution (or letter pixel height to be precise). Approx. 200-300dpi is ideal for tesseract's default training. Also, JPEG is not a good format for text. Internally it will convert to TIFF or PNG.


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

adrian company

unread,
Oct 31, 2013, 2:22:53 AM10/31/13
to tesser...@googlegroups.com
Thanks Sventech, I'll try to deskew the first, i'm using opencv to prepare the image so I cannot use any program to prepare it. I've tried to rotate the image and pass it to tesseract with text in horizontal but tesseract outputs the same. I will also try to pass it to in png format and I will see the result.

adrian company

unread,
Dec 2, 2013, 6:18:45 AM12/2/13
to tesser...@googlegroups.com
Hi again, I've tried to deskew the first image and pass it to tesseract greater, but I have the same result, the numbers and letters are not recognized by tesseract. I post an image where you can see how is my image now.
Any idea???
Thanks in advance again.
Imagen.png

Sven Pedersen

unread,
Dec 2, 2013, 10:13:17 AM12/2/13
to tesser...@googlegroups.com
I get
V! 2\"03ENl
so you could postprocess that kind of thing to get better results -- you need to eliminate the black border for best results. You may need to remove noise. What page seg mode are you using? Make sure you test with the command line version before you try your own. Also, I'm using the latest version 3.02.02
--Sven

Adrian Company

unread,
Dec 2, 2013, 10:52:59 AM12/2/13
to tesser...@googlegroups.com

I get anything from the recognotion, maybe there's sonething wrong in tesseract. I'll try to reinstall and see what i get.

Saludos/Regards/mit Freundlichen Grüßen,
Adrián Company

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/ZTxDg7XROns/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
Message has been deleted

adrian company

unread,
Dec 3, 2013, 4:16:40 AM12/3/13
to tesser...@googlegroups.com
Hi Sventech,
I've tested the image with the command line version and I get the same result as you. But when I use my own software in C++ I cannot obtain the same result, simply get nothing. Currently I am using PSM_SINGLE_LINE, but I've said before I've tried all the page seg modes.
I don't know what is wrong. I've reinstalled tesseract and do the same.


El martes, 3 de diciembre de 2013 07:29:11 UTC+1, adrian company escribió:
And about the page seg I've tried with all the page seg but I still get anything.

Nick White

unread,
Dec 3, 2013, 5:29:58 AM12/3/13
to tesser...@googlegroups.com
Hi Adrian,

Well then your C++ program must be wrong in some way. The command
line version doesn't do anything special, it just uses the API like
anything else. Take a look at api/tesseractmain.cpp to check how
your API usage differs, to find your bug.

Nick

adrian company

unread,
Dec 9, 2013, 7:15:07 AM12/9/13
to tesser...@googlegroups.com
Hi Nick,
I've took a look at api/tesseractmain.cpp as you recommend me, but I cannot find anything wrong, I think. Anyway, I could post my program here and try to guess what is going on with your help.
This is my method:
___________________________________________________________________
void recognizeChar(Mat imagen){

   /*INITIALIZE (TESSERACT)*/
    putenv("TESSDATA_PREFIX=/usr/local/share/");
    setlocale(LC_NUMERIC, "C");
    tesseract::TessBaseAPI OCR;

   if (OCR.Init(NULL, "spa")){
        fprintf( stderr, "cannot could initialize tesseract.... \n" );
        exit(1);
    }
    /*CONFIGURING*/
    OCR.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
    api.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ");//lista blanca
    api.SetVariable("tessedit_char_blacklist" , "<>abcdefghijklmnopqtrstuvwxyz./!¡$%&?¿,;+-#");//lista negra
    OCR.SetImage(imagen.data, imagen.size().width, imagen.size().height, imagen.channels(), imagen.step1());
    OCR.TesseractRect(imagen.data, 0, imagen.step1(), 0, 0, imagen.cols, imagen.rows);

    /*GETTING READED TEXT*/
    char* texto = OCR.GetUTF8Text();
    string t1=texto;
   t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
    cout << "TEXTO: "<<t1.c_str() <<endl;
}

_______________________________________________________________________
Thank you all.

zdenko podobny

unread,
Dec 9, 2013, 4:02:13 PM12/9/13
to tesser...@googlegroups.com
  1. Instead of function listing it is better to provide small test case. It save time to testers...
  2. Skip not "relevant" code (e.g. if you are testing tesseract api, open image with leptonica function and not with opencv...)
  3. You need to fix perspective of image first, so you have some border around text. See I did it in gimp, but maybe you can do it in opencv too...


Zdenko
api_test.cpp
binaria2.png

adrian company

unread,
Dec 10, 2013, 1:41:22 AM12/10/13
to tesser...@googlegroups.com
Hi Zdenko,
I've tried to use the code you posted here for using leptonica, and that gives me an error saying something about min, max specificacion
(Error: Illegal min or max specification!
signal_termination_handler:Error:Signal_termination_handler called:Code 5002)

I've changed the OCR.SetRectangle and the same error displayed, I've tried also with another image and the same.

zdenko podobny

unread,
Dec 10, 2013, 2:17:31 AM12/10/13
to tesser...@googlegroups.com
What OS you use?
Which tesseract version?
What compiller you used?

Zdenko

adrian company

unread,
Dec 10, 2013, 2:37:57 AM12/10/13
to tesser...@googlegroups.com
I'm using Ubuntu 12.04, with tesseract version 3.02 and using Eclipse CDT

zdenko podobny

unread,
Dec 10, 2013, 2:47:22 AM12/10/13
to tesser...@googlegroups.com
Can you please provide exact error message? If it is in spain because of your locale, try to run
LC_ALL=C ./api_test

What is the result of tesseract executable (tesseract binaria2.png binaria2 -psm 7)?


Zdenko

adrian company

unread,
Dec 10, 2013, 3:21:05 AM12/10/13
to tesser...@googlegroups.com
Let me explain, I'm using opencv to preprocess the image and obtain the image of a license plate (binaria2.png, for example), then I want to execute tesseract to obtain the characters from that license plate. I don't mind if the language is in Spanish or any other language. I' ve changed the function I had with the function you provided me, using leptonica and Pix, instead of Mat and when I've runned the program an error has been displayed. The error is the error I showed before.

Error: Illegal min or max specification!
signal_termination_handler: Error:Signal_termination_handler called: Code 5002

About the PSM, I've tried with more than 7, also tried with Single line, Auto PSM, Single Block.... and I obtain the same result. When I obtain some chararcters these are not correct.
For example, in a license plate like: 0211 JCW the result I obtained yestarday was: EEEEEEIIIE or something like that, I don't remember now.

If I execute tesseract from the console in the same image I obtain the desired result, that's why I'm getting crazy with this. I don't understand the reason.

Thank you.

zdenko podobny

unread,
Dec 10, 2013, 3:54:36 AM12/10/13
to tesser...@googlegroups.com
I understand ;-) But "Error: Illegal min or max specification!" during tesseract init is (usually) related to using of locale. That was the reason why I started this.

If you get "wrong" result with your code and good with executable => there must be problem in your code. This is first check I do (custom code vs executable). 

So you need to test each step what it wrong. If "clean" version (tesseract+leptonica) code works, than I would include opencv. If you need quick check if "transfer" from opencv to tesseract is ok, then uset GetThresholdedImage()[1] from tesseract api and than save result PIX with leptonica function pixWrite [2].




Zdenko

adrian company

unread,
Dec 13, 2013, 3:18:47 AM12/13/13
to tesser...@googlegroups.com
Hi Zdenko,
It seems the error has solved itself. Surprisingly I copied you example code to my code in C++ and readed the image in Pix format from the disk instead of pass the image directly from the function and now it seems it works, but does not work fine at all. Is there some way to get more accuracy?
For example if the image has a B, sometimes tesseract gives me an E, or if it's a 6 tesseract gives me a 5.
I tried to train tesseract for my font, but I don't know if I'm doing something wrong because when I try to create the trainedata file, this is empty or gives me an error.

Thank you.
Reply all
Reply to author
Forward
0 new messages