tip required to distinguish between 0 (number) and O (vowel)

1,496 views
Skip to first unread message

Andres

unread,
Jul 19, 2010, 12:09:46 PM7/19/10
to tesser...@googlegroups.com
Hello people,

I'm trying to distinguish between 0 (number) and O (vowel).

O vowel is in uppercase.

In my training tif image, I included lots of zeros and lots of Os, like this: O0O0O0O0O0 OOOO 0000

Boxes and all the training procedure is ok, the log with no errors, but when it reads this line O0O0O0O0O0 all of these characters are read as O vowels.

Could you people have some tip for this ?

Thanks,

Andres

Jimmy O'Regan

unread,
Jul 19, 2010, 10:04:23 PM7/19/10
to tesser...@googlegroups.com
On 19 July 2010 17:09, Andres <andr...@gmail.com> wrote:
> Hello people,
>
> I'm trying to distinguish between 0 (number) and O (vowel).
>
> O vowel is in uppercase.
>
> In my training tif image, I included lots of zeros and lots of Os, like
> this: O0O0O0O0O0 OOOO 0000
>
> Boxes and all the training procedure is ok, the log with no errors, but when
> it reads this line O0O0O0O0O0 all of these characters are read as O vowels.

It's a classification problem: 0 and O look identical to OCR (as do 1,
I and l). There's a post-processing step that normalises 'words'
containing digits/letters, which is what's happening here

>
> Could you people have some tip for this ?
>
> Thanks,
>
> Andres
>

> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

Andres

unread,
Jul 19, 2010, 10:59:21 PM7/19/10
to tesser...@googlegroups.com
Oh. So what you read has to accomplish an underlying rule, it has to be a word form a dictionary or something like that. I'm trying to read addresses, different kind of zip codes and stuff like that, they have not a sorting rule. I think that the only option that I have is what Patrick suggested; comparing the width of the characters.
Thanks for your clarification.


2010/7/19 Jimmy O'Regan <jor...@gmail.com>

Bikash Bag

unread,
Jul 23, 2010, 7:05:43 AM7/23/10
to tesser...@googlegroups.com
hi, I am new to ocr, can anyone tall me the algorithm to classify letters and algorithm to match the letters and algorithm of for developing a ocr?


thanks in advance,
bikash.

Jimmy O'Regan

unread,
Jul 26, 2010, 12:18:46 PM7/26/10
to tesser...@googlegroups.com
On 23 July 2010 12:05, Bikash Bag <bika...@gmail.com> wrote:
> hi, I am new to ocr, can anyone tall me the algorithm to classify letters
> and algorithm to match the letters and algorithm of for developing a ocr?
>

Clearly, you're new to mailing lists, too. What you just did is called
'thread hijacking'. Don't do it.

Your question is overly broad. If these were the days before
Wikipedia, you could be excused such a question, but Wikipedia is
there. Use it, and come back when you have more specific questions.

Reply all
Reply to author
Forward
0 new messages