OCR of bank cheques

377 views
Skip to first unread message

Felipe Leal Coutinho

unread,
Jun 21, 2011, 10:24:50 AM6/21/11
to tesseract-ocr
Hello,

I'm try to use tesseract to make OCR of bank cheques captured from
digital cameras. As you can see (http://dl.dropbox.com/u/24085540/
cheque-exemplo.jpg), these documents have a black text with a color
background (there isn't black color at the background). In order to
improve the results, I think that I will need to make some pre-
processing. Do you suggest something? I was thinking in remove the
background, but I didn't found any method to do that.

Regards,

Felipe.

Humberto Pereira

unread,
Jun 21, 2011, 11:16:10 AM6/21/11
to tesser...@googlegroups.com
Hi Felipe,

your case is simple, remove anything not black. I don't know if
terrasact has a method to do this, but the code is easy.

[]s
Humberto Pereira

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Max Cantor

unread,
Jun 21, 2011, 11:11:13 AM6/21/11
to tesser...@googlegroups.com
if you only care about the four dark, printed lines you might be able to get by with a simple threshold since the characters are so much darker than the pattern. you can also use a sauvola threshold which very intelligently corrects for regional differences in luminosity in the image.

max

Dmitri Silaev

unread,
Jun 21, 2011, 12:13:01 PM6/21/11
to tesser...@googlegroups.com, felipel...@gmail.com
Indeed, for this very image it's easy: just run say Photoshop, crop to
ROI and do research of how to mix color channels in order for the text
to stand out clearly against the background. Then select a suitable
threshold value, and you're done. Then you should have no difficulties
to code that into your program. If you're lazy to code it yourself,
try to google around for those keywords.

The problem arises when you wish to make this algorithm to be fully
automated. All images you would pass to it can differ significantly in
many aspects. Then those fixed channel percentages and thresholds
won't suffice, you'll need to implement something more intelligent.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Tue, Jun 21, 2011 at 10:24 AM, Felipe Leal Coutinho
<felipel...@gmail.com> wrote:

Reply all
Reply to author
Forward
0 new messages