tesseract does not recognize grey colored fonts in the images..

68 views
Skip to first unread message

Yogesh Sanchihar

unread,
Jul 28, 2018, 3:00:16 PM7/28/18
to tesseract-ocr
If we have a text not black, but light greyish. tesseract does not recognize it.

Any solutions to this problem.

Have attached images of the sample bill.

Suppose I want to extract Base Fare

Base Fare  - Rs 500

But Since Base Fare is light greyish. Tesseract does not recognize it at all.


sample Ola bill.jpg
Message has been deleted

Yogesh Sanchihar

unread,
Jul 31, 2018, 5:59:11 PM7/31/18
to tesser...@googlegroups.com
okay, James.. Than you for your response. I would try.

On Tue, Jul 31, 2018 at 5:04 PM, James Q <james.qu...@taina.tech> wrote:
It could be that a threshold operation is taking place at a lower brightness than you grey text. Try binarizing the image with a high threshold value befo sending to tesseract (e.g.200) this should make all the text black.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f1c49f5b-27f8-4ed4-8d4d-8f01efe4a58f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

chandra churh chatterjee

unread,
Aug 1, 2018, 7:35:08 AM8/1/18
to tesser...@googlegroups.com
Binarize the image and it might give a good solution.

Chandra Churh Chatterjee

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Yogesh Sanchihar

unread,
Aug 2, 2018, 4:20:01 AM8/2/18
to tesser...@googlegroups.com
Namastey! okay, I will try this.. Could you help me how to build an image preprocessing pipeline? or atleast sequential steps that I should use to build one.

On Wed, Aug 1, 2018 at 1:04 PM, chandra churh chatterjee <chandrachurh...@gmail.com> wrote:
Binarize the image and it might give a good solution.

Chandra Churh Chatterjee
On Sat, Jul 28, 2018, 8:30 PM Yogesh Sanchihar <yogesh.yogesh.sanchihar9@gmail.com> wrote:
If we have a text not black, but light greyish. tesseract does not recognize it.

Any solutions to this problem.

Have attached images of the sample bill.

Suppose I want to extract Base Fare

Base Fare  - Rs 500

But Since Base Fare is light greyish. Tesseract does not recognize it at all.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages