Products Expiration date recognition

762 views
Skip to first unread message

Cristian

unread,
Jun 3, 2016, 10:09:04 AM6/3/16
to tesseract-ocr
Hi guys,

I'm new on tesseract. I'm working on application that has to recognize the expiration date of some products like foods. The input will be an image (very good resolution) with only the date on it.
Before putting my hand on the code, I'll be appreciated if some of you with more experience could give me some suggestion about format on input image, dimension, colors, and also on tesseract possible configurations, training data etc. 
The project is still in starting phase so I can put some good starting condition in order to have the best from tesseract our.

Thanks in advance,
Cristian

in attached you can find an example of actual image that will be the input for tesseract ocr.

iu-2.jpeg

Tom Morris

unread,
Jun 3, 2016, 12:44:25 PM6/3/16
to tesseract-ocr

On Friday, June 3, 2016 at 10:09:04 AM UTC-4, Cristian wrote:

I'm new on tesseract. I'm working on application that has to recognize the expiration date of some products like foods. The input will be an image (very good resolution) with only the date on it.
Before putting my hand on the code, I'll be appreciated if some of you with more experience could give me some suggestion about format on input image, dimension, colors, and also on tesseract possible configurations, training data etc. 
The project is still in starting phase so I can put some good starting condition in order to have the best from tesseract our.

Thanks for attaching an example image. That helps make the discussion more concrete and productive.

The first thing that I'll note is that the image does not have only the date on it, but also a textured carton with additional printing on top, a background, etc. We often forget how much noise our human visual system excludes automatically.

To compute a region of interest or crop box so that your image really does only include the date and related text, look at using a text detection algorithm such as the one included in OpenCV  http://docs.opencv.org/3.1.0/da/d56/group__text__detect.html

Tesseract is going to work on a bitonal image. Since you have more knowledge about the conditions the image was made under, the subject, etc, you can probably do a better job of converting to bitonal. For resolution, check the FAQ. There are some guidelines there about the height of characters, etc. 

Good luck!

Tom

Allistair C

unread,
Jun 3, 2016, 12:49:06 PM6/3/16
to tesser...@googlegroups.com
Everything Tom said and I would also stress that I have had a lot of trouble with text that borders noise - the grey carton may be easy enough to remove but the dark crease/join where the box closes and the proximity of text to it (and angle as tess breaks at angles > 10 drug in my tests) will cause segmentation and recon to fail, so yes getting rid of these hard edges to allow the text to breathe is important 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d3563f4b-c8f0-4989-bbee-42cbeda10c1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages