mix of small and big size text

58 views
Skip to first unread message

MedCo

unread,
Mar 28, 2015, 1:31:12 PM3/28/15
to tesser...@googlegroups.com

Hello,

I am trying to OCR following image and what I get is "V,365 M". It doesn't recognize small T, small m and L correctly.

I would like to get "VT 365 mL" from this image. What do I need to get correct text from this image?

thanks in advance!


 returns-> V,365 M


Art W Rhyno

unread,
Mar 28, 2015, 7:58:09 PM3/28/15
to tesser...@googlegroups.com
> I would like to get "VT 365 mL" from this image. What do I need to get correct text from this image?

It's more steps, but you could try extracting the bounding rectangle for each letter using the API and then using tesseract's single character mode (-psm 10) to process the results. Tesseract's training function would also give the coordinates for each character, for example:

tesseract vt.png vt batch.nochop makebox

Extracting the image for each character based on the coordinates, and then resizing for small images, would probably get everything.

art
Reply all
Reply to author
Forward
0 new messages