Tesseract cannot read correct my image even it's very simple

130 views

Skip to first unread message

inewb...@gmail.com

unread,

Jun 17, 2013, 7:09:02 AM6/17/13

to tesser...@googlegroups.com

Hi all,

Please help me solve the problem. My text is very simple but Tesseract show it as A7V33â€˜!, not A798D7. Please tell me why? I how to make Tesseract read it correct?

Thanks

ashx.jpg

Nick White

unread,

Jun 24, 2013, 1:10:28 PM6/24/13

to tesser...@googlegroups.com

Hi,

You're using Tesseract to try to crack captchas? Interesting...
There are other projects around that are focused on this; I don't
know how they work, but it might be worth you checking them out as
well.

There are two obvious issues here.

First is that the text has noise around it, which is hampering
Tesseract's recognition. If possible you should try to pre-process
it to remove as much noise as possible.

Second is that it looks likely that you only expect to see ASCII
characters. If that is the case, use the whitelist function to
ensure that characters like euro and a-circumflex are never
considered. It is explained in this FAQ entry:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

Nick

Reply all

Reply to author

Forward

0 new messages