Problem Recognizing Numbers

3,561 views
Skip to first unread message

JD

unread,
Feb 12, 2012, 8:56:06 PM2/12/12
to tesseract-ocr
I'm using v 3.01 on Windows 7 to perform OCR on another program. I
don't have access to the fonts the program is using, so I trained
tesseract using some screenshots, and so far the text recognition is
far better than I expected. However, when I try to process a
screenshot that contains only a few numbers, it doesn't match anything
at all. If was matching garbage, or the wrong numbers, then I'd just
keep working on improving the training... but it doesn't find
anything. Does anyone have a suggestion about what I should try?

It doesn't look like I can attach a screenshot, but the numbers are in
a column... something like this:

10
13
14
15
17

I pre-process the screenshots so the text is black on white. I also
zoom in on the images, so they're slightly blurred (only very
slightly)... but the text recognition is near perfect, so I don't
think that's an issue. Plus, it seems like it should find SOMETHING.

Sven Pedersen

unread,
Feb 13, 2012, 11:22:46 AM2/13/12
to tesser...@googlegroups.com
Tesseract is not good at handling small amounts of text. You may try to duplicate the image area so the numbers appear more than once and then post-process.


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Chris

unread,
Feb 13, 2012, 11:42:19 AM2/13/12
to tesseract-ocr
I'd try segmenting the numbers out yourself and feeding them into
tesseract as individual characters. Might work better than feeding it
the whole image.

Make sure you put some padding around each character.

John Williams

unread,
Feb 13, 2012, 4:55:55 PM2/13/12
to tesser...@googlegroups.com
If I duplicate the column 9 times, so that there's ten columns with the same numbers, it reads it correctly. Running these results through the training tools didn't help it recognize the original image, though. Running tesseract on images with a single digit yielded nothing as well.

In my program, do I have to programatically duplicate my column of numbers several times and then figure out what the result was supposed to be... or can I train tesseract to recognize a single column? I suppose duplicating it will work, but it seems like a bad hack.

Sven Pedersen

unread,
Feb 13, 2012, 11:54:39 PM2/13/12
to tesser...@googlegroups.com
Yes, tesseract is designed for whole pages, so it needs context.
--Sven

Dmitri Silaev

unread,
Feb 14, 2012, 2:33:37 AM2/14/12
to tesser...@googlegroups.com
Did you try the "psm" switch (look for it in the forum)? Your own
segmentation? Both combined?

Warm regards,
Dmitri Silaev
www.CustomOCR.com

John Williams

unread,
Feb 14, 2012, 7:25:25 AM2/14/12
to tesser...@googlegroups.com
You're right... I've been testing out the psm flag in various situations this whole time, but last night when I was trying out all of your suggestions, it slipped my mind. The best solution I've found is to segment the columns into "rows" of 1 or 2 digits each and use the "-psm 7" switch. So far, it reads everything perfectly.

On a semi-related note, I'm really impressed with Tesseract. In my preliminary OCR research I read many posts saying that Tesseract's recognition was fairly poor and that a different/commercial OCR package should be used. I think these people didn't know about or hadn't use the training feature of Tesseract, because it's working wonderfully for me, which is great considering I had almost no expectations coming in :)

Thanks a lot to everyone for the help and to the developers who work on this tool.

Sriranga(78yrs)

unread,
Feb 14, 2012, 7:37:02 AM2/14/12
to tesser...@googlegroups.com
Congratulations! You have succeeded in your efforts. It would be nice to post sample of tif and commandline used as well as output text -  for benefit of users of tesseract-ocr.
Cheers,
-sriranga(79yrs)

toulipe

unread,
Feb 15, 2012, 2:51:45 AM2/15/12
to tesser...@googlegroups.com
I tried to read some plates with the program 

Do you have a solution for this kind of pict ?


锟�锟�.png

Sriranga(78yrsold)

unread,
Feb 15, 2012, 3:58:18 AM2/15/12
to tesser...@googlegroups.com
with help of irfanview, changed to greyscale color/  2 color and resolution to 600dpi.
Cheers,
-sriranga(79yrs)

On Wed, Feb 15, 2012 at 1:21 PM, toulipe <enregis...@gmail.com> wrote:
I tried to read some plates with the program 

Do you have a solution for this kind of pict ?
test756.txt
BJZ756.PNG

Sriranga(78yrsold)

unread,
Feb 15, 2012, 4:01:24 AM2/15/12
to tesser...@googlegroups.com
renamed the png file as "BJZ756.png to remove ?? contained in "??015.png"
-sriranga(79yrs)

simon.ei...@vol.at

unread,
Feb 15, 2012, 3:41:59 AM2/15/12
to tesser...@googlegroups.com
Hi,

I wonder if this image actually contains readable data.
Cause i tried just to get something with a commercial OCR software
(abbyy finereader 11 pro) and i didn't get anything out of that image.

I am blind so i can't tell if there is actually any content there.

Greetings,
Simon

toulipe

unread,
Feb 15, 2012, 5:03:06 AM2/15/12
to tesser...@googlegroups.com
Thank you. 
You picture works great on tesseract.

However, When i do the manipulation it does not give me an good result...

I did : 
 to greyscale color (Image -> Convert to grayscal)
 2 colors (Image -> decrease color depth)
resolution to 600dpi ((Image -> Information... Resolution : 600x600 DPI) 

is it the good way ?

Sriranga(78yrs)

unread,
Feb 15, 2012, 5:21:47 AM2/15/12
to tesser...@googlegroups.com
 yes

--

toulipe

unread,
Feb 15, 2012, 6:12:23 AM2/15/12
to tesser...@googlegroups.com
I have this picture in result...

Not exactly the same as you...
It seems that yours have some black pixel in the letter ?
maybe a step i didnt do correctly ?
bj2.png

Sriranga(78yrsold)

unread,
Feb 15, 2012, 7:08:44 AM2/15/12
to tesser...@googlegroups.com
If you click inverted color under Image->invert color(Paint Brush) - you will get black pixel in the letter.

--
Reply all
Reply to author
Forward
0 new messages