small image and OCR

158 views
Skip to first unread message

alex kelly

unread,
Apr 13, 2019, 1:36:24 PM4/13/19
to tesseract-ocr
Hello,

I'm trying to OCR a small grey scale image, its for a energy meter but it says the image is to small  - the full response is below: 

pi@OCRReader:~ $ tesseract test_images/cropped_image.png out
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Error in pixGenerateHalftoneMask: pix too small: w = 230, h = 50
Empty page!!
Error in pixGenerateHalftoneMask: pix too small: w = 230, h = 50
Empty page!!

How can I make tesseract-ocr read the value? 

I can send this to google computer vission API but i would rather do this on the device (rather than sending it to the cloud) and I was recommended tesseract.  If there is a better solution, please let me know. 

Thanks 
cropped_image.jpg
cropped_image.png

Lorenzo Bolzani

unread,
Apr 14, 2019, 5:59:53 AM4/14/19
to tesser...@googlegroups.com
Hi Alex,
you need to pre process the image a little.

First negate it, tesseract expect dark on white background text.

Then use --psm 6 to tell tesseract that this is a single block or text and not a complex page to split in paragraphs. Also try psm 7, single line.

tesseract --psm 6 cropped_image.jpg -
1.4 95500>0

Now by stretching the contrast (or and OTSU/adaptive threshold) and straightening the image I get (almost) correct results, see the attached image.

1 4 9 55 0 5

Ideally you want an image as simple as possible, black text on white background. You may also try to crop the black border out, if possible.

Have a look here on how to isolate blocks of text:



Bye

Lorenzo



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2314d522-ac6d-4abc-8d17-42a198503b7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
cropped_image_ok.jpg

alex kelly

unread,
Apr 23, 2019, 10:59:52 AM4/23/19
to tesseract-ocr
Thanks for getting back to me.  When i run it i get an error, any ideas why and how to resolve it?  

pi@ShopFloorOCRReader:~ $ tesseract --psm 6 "test_images/cropped_image.jpg"
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
read_params_file: parameter not found: ����
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Apr 23, 2019, 11:07:54 AM4/23/19
to tesser...@googlegroups.com
What about:
tesseract --help
;-)

Zdenko


ut 23. 4. 2019 o 16:59 alex kelly <alex.k...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Lorenzo Bolzani

unread,
Apr 23, 2019, 12:53:33 PM4/23/19
to tesser...@googlegroups.com
Hi,
I suspect you did a cut and paste or some edits and now you have some non-printable characters in your command line (the question mark boxes). Write it again from scratch.

And you are missing one parameter in the command line, the output file, you can use "-" for standard output.

$ tesseract --psm 6 image.png -

You are also using version 3.x, you should probably upgrade to 4.x.



Lorenzo

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages