tips for improving Tesseract accuracy and speed...

19,804 views
Skip to first unread message

Andres

unread,
Mar 29, 2011, 12:17:47 AM3/29/11
to tesser...@googlegroups.com
...required.

Hello people,

I'm develping a licence plate recognition system from long ago and I still have to improve the use of Tesseract to make it usable.

My first concern is about speed:
After extracting the licence plate image, I get an image like this:

https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP

As you may see, there are only 6 characters (tess is recognizing more because there are some blemishes over there, but I get rid of them with some postprocessing of the layout of the recognized chars)

In an Intel I7 720 (good power, but using a single thread) the tesseract part is taking something like 230 ms. This is too much time for what I need.

The image is 500 x 117 pixels. I noted that when I reduce the size of this image the detection time is reduced in proportion with the image area, which makes good sense. But the accuracy of the OCR is poor when the characters height is below 90 pixels.

So, I assume that there is a problem with the way I trained tesseract.

Because the characters in the plates are assorted (3 alphanumeric, 3 numeric) I trained it with just a single image with all the letters in the alphabet. I saw that you suggest large training but I imagine that that doesn't apply here where the characters are not organized in words. Am I correct with this ?

So, for you to see, this is the image with what I trained Tesseract:

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL

In this image the characters are about 55 pixels height.

Then, for frequent_word_list and words_list I included a single entry for each character, I mean, something starting with this:

A
B
C
D
...

Do you see something to be improved on what I did ? Should I perhaps use a training image with more letters, with more combinations ? Will that help somehow ?

Should I include in the same image a copy the same character set but with smaller size ? In that way, will I be able to pass Tesseract smaller images and get more speed without sacrificing detection quality ?


On the other hand, I found some strange behavior of Tesseract about which I would like to know a little more:
In my preprocessing I tried Otsu thresholding (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much better results, but surprisingly for Tesseract it was worse. It decreased the thickness of the draw of the chars, and the chars I used to train Tesseract were bolder. So, Tesseract matches the "boldness" of the characters ? Should I train Tesseract with different levels of boldness ?

I'm using Tesseract 2.04 for this. Do you think that some of these issues will go better by using Tess 3.0 ?


Thanks,

Andres






Dmitri Silaev

unread,
Mar 30, 2011, 7:42:52 AM3/30/11
to tesser...@googlegroups.com
Depending on the quality of your source images, I think it'd be
reasonable to scale them down in order for letters to have the height
of 40 pixels or so. In that way Tesseract will just have to do a bit
less work - scan lesser pixels and construct shorter glyph outlines.

The accuracy may suffer even for such a considerable char height (90
is certainly more than enough) if you have significant discrepancies
between training and source images. You should try to pass to
Tesseract images having as similar as possible thickness and
orientation. To achieve this, you need to pre-process images to get
them look alike with respect to lighting conditions, contrast, blur
amount, physical dimensions; rectify perspective distortion, etc. And
of course, always use the same binarization procedure with the same
parameter set, or at least giving predictably similar results for a
range of your source images. Btw, using Otsu thresholding prior to
passing images to Tesseract is useless as Otsu is a binarization
procedure employed by Tesseract itself. Except if you do Otsu with
your own special parameter set and then pass a 1-bit image.

Next, you should train Tesseract having in mind that ideally there
should be around 20 samples of each char. You shouldn't be striving to
train using as many as possible char sizes - regardless of the size,
Tesseract scales character "models" up or down to the same internal
dimensions. But if your source char sizes differ - that's no problem,
they'll do. Provide real images (probably pre-processed) images for
training, not manually compiled ones.

What can be done to further improve the speed and accuracy - process
your images char by char, bypassing Tesseract's layout analysis. This
approach also perfectly allows to use char-position-specific
whitelists (letters, digits) for even more speedup and precision.

Everything related to Tesseract's dictionary facility is totally
irrelevant here. You'd better provide entirely empty files for your
"traineddata".

HTH

Warm regards,
Dmitri Silaev

> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

Dmitri Silaev

unread,
Mar 30, 2011, 7:49:22 AM3/30/11
to tesser...@googlegroups.com
P.S.: If you're still sure that reasonable downscaling of your images
sacrifices the accuracy, please share one or two of your *unprocessed*
images to investigate further.

And I'd suggest to keep up with the latest revisions of Tesseract. The
API changes significantly, but Tess is definitely being improved in
the sense of stability, new capabilities and also code efficiency,
which explicitly may lead to improved performance which you are
looking for.

Warm regards,
Dmitri Silaev

On Tue, Mar 29, 2011 at 8:17 AM, Andres <andr...@gmail.com> wrote:

cong nguyenba

unread,
Mar 30, 2011, 12:02:04 PM3/30/11
to tesser...@googlegroups.com
I have another approach for you here: try to apply binarization using
adaptive threshold! Delving into engine by following apdaptive
classification in source code for speedup! I think it is enough for
your expectation!

Max Cantor

unread,
Mar 30, 2011, 8:27:43 PM3/30/11
to tesser...@googlegroups.com, tesser...@googlegroups.com
Yes. I've had great experience with sauvola binarize from leptonica. Gamer works too but is much much slower

Cong Nguyen

unread,
Mar 30, 2011, 10:09:20 PM3/30/11
to tesser...@googlegroups.com
Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM
USING DYNAMIC PROGRAMMING".
Complexity is: O(n), n is number of pixels.

Dmitri Silaev

unread,
Mar 31, 2011, 1:13:23 AM3/31/11
to tesser...@googlegroups.com
Could you give us a link to where the text of this article can be
downloaded from? Can't find it anywhere, only the title and authors.

TP

unread,
Mar 31, 2011, 11:01:29 AM3/31/11
to tesser...@googlegroups.com
On Wed, Mar 30, 2011 at 5:27 PM, Max Cantor <mxca...@gmail.com> wrote:
> Yes. I've had great experience with sauvola binarize from leptonica. Gamer works too but is much much slower
>
> On Mar 31, 2011, at 0:02, cong nguyenba <congng...@gmail.com> wrote:
>
>> I have another approach for you here: try to apply binarization using
>> adaptive threshold! Delving into engine by following apdaptive
>> classification in source code for speedup! I think it is enough for
>> your expectation!

Here's links to the relevant Leptonica API source files:

adaptmap.c - local adaptive grayscale quantization; mostly
gray-to-gray in preparation
(http://tpgit.github.com/Leptonica/adaptmap_8c.html#_details)

binarize.c - Special binarization methods, locally adaptive: Otsu and
Sauvola (http://tpgit.github.com/Leptonica/binarize_8c.html#_details)

grayquant.c - Standard, simple, general grayscale quantization
(http://tpgit.github.com/Leptonica/grayquant_8c.html#_details)

See also:

Grayscale Mapping and Binarization
(http://tpgit.github.com/UnOfficialLeptDocs/leptonica/binarization.html)

Document Image Analysis
(http://tpgit.github.com/UnOfficialLeptDocs/leptonica/document-image-analysis.html)
which refers to
http://tpgit.github.com/Leptonica/livre__adapt_8c_source.html and
http://tpgit.github.com/Leptonica/livre__tophat_8c_source.html.

-- TP

Dmitri Silaev

unread,
Apr 7, 2011, 3:33:52 AM4/7/11
to tesser...@googlegroups.com, Sriranga(78yrsold)
Sriranga,

Sorry for the delay.

I meant that for training you just need to use as many as possible
*different* images, not multiple renamed copies of the same image.

Warm regards,
Dmitri Silaev

On Mon, Apr 4, 2011 at 2:56 PM, Sriranga(78yrsold)
<withbl...@gmail.com> wrote:
> Dmitri,
> I am extremely thankful for the valuable guidance.
> With reference to your last para - I could not follow clearly and is in
> confusion. Kindly eloborate little bit with your sample (any lang or
> English) will do. Kindly pardon  me for troubling you in the midst of your
> hectic work.
> With Choicest Best Wishes and Good Luck,
> -sriranga(78yrs)
>
> On Mon, Apr 4, 2011 at 11:50 AM, Dmitri Silaev <daemo...@gmail.com>
> wrote:
>>
>> Dear Sriranga,
>>
>> Sorry for the delay.
>>
>> You indeed can manually set the DPI in an image file using any image
>> editor, but the only thing that matters is the resolution your image
>> got from the scanner. Roughly saying, the resolution here means the
>> number of pixels per letter. This is controlled by the scanner itself
>> or scanning program settings. By changing DPI afterwards in an image
>> editor, you just change some image's attribute values, not image's
>> pixels.
>>
>> 300 DPI is more than okay for your needs.
>>
>> Renaming a box/train file and feeding it to Tesseract as another
>> sample is not a solution, as by "sample" we here mean a copy of a
>> character we obtained at slightly different conditions in another
>> [scanned] image, or at least at another position in the same image. So
>> get as many images as possible, count the number of character samples
>> within each and thus build your training body.
>>
>> Warm regards,
>> Dmitri Silaev
>>
>>
>>
>>
>>
>> On Sat, Apr 2, 2011 at 1:13 PM, Sriranga(78yrsold)
>> <withbl...@gmail.com> wrote:
>> > Dear Dimitri,
>> > Awaiting your valuable guidance please.
>> > With warmest regards,
>> > -sriranga(78yrs)
>> >
>> > On Wed, Mar 30, 2011 at 8:29 PM, Sriranga(78yrsold)
>> > <withbl...@gmail.com> wrote:
>> >>
>> >> Dear Dimitri,
>> >> It is presumed that if the scanned imges has 300 x 300 dpi is
>> >> reasonable?
>> >> With help of Irfanview I can find out dpi as well as increase or
>> >> decrease
>> >> dpi can be done.
>> >> Generally,as a standard I select dpi =300 and resized to 1200 or 2400
>> >> from
>> >> 600 which is convenient for edit the box file with help of owler. Hope
>> >> this
>> >> will not minimise accuracy of the output. Sample tif attached for
>> >> approval.
>> >>
>> >> Regarding 20 samples of each char = Supose, if theimage1. tif file
>> >> contains alphabets of single char can be used 20 times by renaming the
>> >> same
>> >> image file as image1.tif, image2.tif, image3.tif .....image20.tif ? If
>> >> not
>> >> kindly provide me with your  sample, if any.
>> >> With Warmest Regards,
>> >> -sriranga(78yrs)

Adam Freeman

unread,
Nov 5, 2013, 1:29:23 AM11/5/13
to tesser...@googlegroups.com, mxca...@gmail.com
Can you perhaps provide any code samples for calling the leptonica sauvola method with say a byte array of unsigned char*?  I am trying to put something similar together.  I am using adaptive thresholding currently which works pretty well but I wanted to compare and contrast with the sauvola method and see if I can get better results.

I tried posting earlier.  I hope this is not a second post.  I am not sure if the first one bounced or ... ?
Thank you,
Adam

Wasim Safdar

unread,
Oct 9, 2014, 2:22:55 PM10/9/14
to tesser...@googlegroups.com
Hello,
          I saw this post and it is very much interesting. I am also working on tesseract. I think that preprocessing of image or downscaling the original image decreases efficiency of algorithm. Preprocessing of image also slows down the overall execution time. I think you are training the images well. What you can do is to train the tesseract of different   character sizes. Then if you downscale your image, it will not effect efficiency and also your speed increases. 

Adam

unread,
Oct 10, 2014, 12:53:05 PM10/10/14
to tesser...@googlegroups.com
Hi Wasim,
Where are you from?
Thanks,
Adam


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/enwft4qSDfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.

Mindy cheugn

unread,
Aug 3, 2016, 2:05:37 AM8/3/16
to tesseract-ocr
Acctually, the accuracy of the OCR is hard to be guaranteed. As i know you may select a smaller region from mGray where your text is, before createBitmap - so the more heavy methods that follow process a smaller image.
Reply all
Reply to author
Forward
0 new messages