Re: [tesseract-ocr] Best filter/preprocess for these type of images?

245 views

Skip to first unread message

Lorenzo Bolzani

unread,

Feb 24, 2020, 6:33:06 AM2/24/20

to tesser...@googlegroups.com

Do a threshold (otsu), count the white and black pixels, this will tell you if you have white text on dark background or the opposite.

If necessary, negate the image so to have a dark text on bright background.

The images are very small, you want al least 35/50px. Try to have them larger if possible otherwise upscale might help. Sharpening or other steps might help before the threshold.

Lorenzo

Il giorno dom 23 feb 2020 alle ore 12:29 Jonathan Dahan <jdah...@gmail.com> ha scritto:

Hi, I would love to know which type of custom preprocessing these images would go through in order to be the best to successfully read them. Note that the filter/preprocess needs to be generic which means to work across all four images.

https://i.stack.imgur.com/4N62x.png
https://i.stack.imgur.com/3upPM.png
https://i.stack.imgur.com/WcrGU.png
https://i.stack.imgur.com/ymhv6.png

Thanks.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/05373fc0-d380-4aaa-a2ef-6d9538601ab0%40googlegroups.com.

Jonathan Dahan

unread,

Feb 26, 2020, 7:58:49 AM2/26/20

to tesseract-ocr

I tried doing the threshold method but it screws up the ice.png text.

On Monday, February 24, 2020 at 1:33:06 PM UTC+2, Lorenzo Blz wrote:

Do a threshold (otsu), count the white and black pixels, this will tell you if you have white text on dark background or the opposite.
If necessary, negate the image so to have a dark text on bright background.

The images are very small, you want al least 35/50px. Try to have them larger if possible otherwise upscale might help. Sharpening or other steps might help before the threshold.

Lorenzo

Il giorno dom 23 feb 2020 alle ore 12:29 Jonathan Dahan <jdah...@gmail.com> ha scritto:

Hi, I would love to know which type of custom preprocessing these images would go through in order to be the best to successfully read them. Note that the filter/preprocess needs to be generic which means to work across all four images.

https://i.stack.imgur.com/4N62x.png
https://i.stack.imgur.com/3upPM.png
https://i.stack.imgur.com/WcrGU.png
https://i.stack.imgur.com/ymhv6.png

Thanks.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Reply all

Reply to author

Forward

0 new messages