Tesseract cannot read text on stripe background / but Google AI can

259 views
Skip to first unread message

Timo Richter

unread,
Jun 1, 2021, 2:43:08 PM6/1/21
to tesseract-ocr
Hi everyone,

I have tried to ocr an identity card [1] and big parts were not recognised. I do not get anything from the headline nor the first few rows. From the middle, Tesseract partially finds correct text. There are lines and things in the background, as usual. In the monochrome picture I could not completely extract the letters from the background. Some gray pixels stay there. But there is a website that does OCR and it works perfectly [2]. Why do I get bad results and my Tesseract does not read the text? What will the website do another way?


Thank you in advance,

Timo



black_white.png
DeepinScreenshot_Seleccionar área_20210601165523.png

Ajinkya Bobade

unread,
Jun 2, 2021, 11:23:44 AM6/2/21
to tesser...@googlegroups.com
Hello,
I have created a web extension which solves this problem. Upload image to https://imagescanner-online.com/  it will clear your noise and pixel-segment text so that you get a very good quality input, which you can feed to tesseract and get good output

Regards
Ajinkya

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com.

Timo Richter

unread,
Jun 3, 2021, 7:08:31 AM6/3/21
to tesseract-ocr
Hi Ajinkya,

the result looks better than mine. But it looks like a very low resolution, the text is not readable. How did you do it?
Still the Google AI website is a lot more accurate. How can they have done this?

Ajinkya Bobade

unread,
Jun 4, 2021, 3:08:51 PM6/4/21
to tesser...@googlegroups.com
Hi Timo,

Results are in low resolution because the image that you uploaded must be taken from sample set, this image that you uploaded is not taken from a real mobile phone camera.

I recommend you to upload image captured from good quality phone camera and retry few more times with different images captured from phone camera. My software works poorly for sample images which are not real world. It works excellent for images in real world. 

Feel free to reach out to me if you have any questions or concerns. 

Regards
Ajinkya 






Ajinkya Bobade

unread,
Jun 4, 2021, 3:13:50 PM6/4/21
to tesser...@googlegroups.com
Following up: try uploading images of real world docs. Please avoid taking photos of photos ( that is photos of computer screen which has documents). Don't take photos of computer screen containing documents.  Capture real document and upload them. 

Timo Richter

unread,
Jun 9, 2021, 4:21:43 AM6/9/21
to tesseract-ocr
Does anyone have an idea why https://cloud.google.com/document-ai#section-2 is so good while I get bad results with plain Tesseract? What could cause this?
Reply all
Reply to author
Forward
0 new messages