Specify the background color?

694 views
Skip to first unread message

jean...@spaggiari.org

unread,
Jan 27, 2014, 10:00:00 AM1/27/14
to tesser...@googlegroups.com

Hi,

I try to get tesseract reading numbers (1 digit) from single png files.

It's working pretty well for 5, 6, etc. but for 8, it gives me : instead... which make sense. So I want to tell it what is the background color. That way it will know what is the number, what is the background.

I tried to revert the background and the foreground of the picture but got the same result.

(I tried to attach the picture, it appears on the top of the message. Not sure it will work).

Thanks for your recommendations.

JM

jean...@spaggiari.org

unread,
Jan 27, 2014, 10:05:57 AM1/27/14
to tesser...@googlegroups.com
Sorry, I forgot to give the command line I use:
tesseract 8.png test -psm 10

Nick White

unread,
Jan 27, 2014, 10:11:13 AM1/27/14
to tesser...@googlegroups.com
Hi,

> I try to get tesseract reading numbers (1 digit) from single png files.
>
> It's working pretty well for 5, 6, etc. but for 8, it gives me : instead...
> which make sense. So I want to tell it what is the background color. That way
> it will know what is the number, what is the background.

The first thing to do would be to tell Tesseract you're only
interested in digits:
https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

> I tried to revert the background and the foreground of the picture but got the
> same result.

The background should be white, the foreground black. But from the
attached image it looks like it already is, so there's no need to
change that.

Nick

jean...@spaggiari.org

unread,
Jan 27, 2014, 10:17:09 AM1/27/14
to tesser...@googlegroups.com
Hi Nick,

Thanks for your prompt reply.

I also need to get /, but lucky I am, I don't need : yet.

The question is, what if I need to get : later?

I will try the option with the you sent. For now it might be all good for me.

Regarding the picture I attached, I tried with the black on white and white on black. But based on your reply I will keep the black on white.

Thanks again!

JM

jean...@spaggiari.org

unread,
Jan 27, 2014, 10:38:21 AM1/27/14
to tesser...@googlegroups.com
Just to provide a feedback, option proposed is working well for my current needs.

Thanks.

JM

Nick White

unread,
Jan 27, 2014, 10:54:52 AM1/27/14
to tesser...@googlegroups.com
Hi again,

Thanks for the feedback, I'm glad it's helpful.

> I also need to get /, but lucky I am, I don't need : yet.

To add '/' you can create a copy of the 'digits' config file (e.g.
called 'mydigits') and add the '/' to the end of tessedit_char_whitelist
entry. You can then run something like this:
tesseract 8.png test -psm 10 mydigits

> The question is, what if I need to get : later?

You'd have to add that to the whitelist as well. It may sometimes
misrecognise 8 as :, unfortunately that's probably unavoidable.

Hope that helps :)

Nick

Dmitri Silaev

unread,
Jan 27, 2014, 11:36:38 AM1/27/14
to tesser...@googlegroups.com
Probably something can be done to avoid 8 <-> : (and similar) recognition errors. For example, you can add an extra character or two to your every input image. This might help outweigh Tesseract's confidence in semicolons and dots and make it recognize your single-character text correctly. Later you can ignore those extra characters and leave the one you need.

The beginning of the story can be viewed here: http://code.google.com/p/tesseract-ocr/issues/detail?id=446
The problem was not of major importance to me for a while and I'm not in the know of the progress, but seemingly this is also the case with Tesseract roadmap and nothing had been done with it since early 2011. Therefore AFAIC generally there's no conventional way to work with single-character texts, only custom Tess code corrections and clumsy workarounds.

HTH

Best regards,
Dmitri Silaev
www.CustomOCR.com




--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages