Tesseract cannot read correct my image even it's very simple

130 views
Skip to first unread message

inewb...@gmail.com

unread,
Jun 17, 2013, 7:09:02 AM6/17/13
to tesser...@googlegroups.com
Hi all,
 
Please help me solve the problem. My text is very simple but Tesseract show it as A7V33‘!, not A798D7. Please tell me why? I how to make Tesseract read it correct?
 
Thanks
ashx.jpg

Nick White

unread,
Jun 24, 2013, 1:10:28 PM6/24/13
to tesser...@googlegroups.com
Hi,
You're using Tesseract to try to crack captchas? Interesting...
There are other projects around that are focused on this; I don't
know how they work, but it might be worth you checking them out as
well.

There are two obvious issues here.

First is that the text has noise around it, which is hampering
Tesseract's recognition. If possible you should try to pre-process
it to remove as much noise as possible.

Second is that it looks likely that you only expect to see ASCII
characters. If that is the case, use the whitelist function to
ensure that characters like euro and a-circumflex are never
considered. It is explained in this FAQ entry:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

Nick
Reply all
Reply to author
Forward
0 new messages