Have problem with parsing bold white fonts

Евгений Захаров

unread,

Aug 4, 2020, 7:07:47 AM8/4/20

to tesseract-ocr

Hi. It wold be great if somebody could help with parsing these bold white fonts.

I try to parse this images with some preprocessing. For example after "erode", "GaussianBlur" and "Canny" some part of text start to recognise, but not all. And with different settings of preprocessing start to recognise different part of images.

Does it exist any image preprocessing algorithm to detect such text? Or it possible only to retrain model to detect this bold font?

Zdenko Podobny

unread,

Aug 4, 2020, 1:18:54 PM8/4/20

to tesser...@googlegroups.com

First you need to remove the background.

Then https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md

Zdenko

ut 4. 8. 2020 o 13:07 Евгений Захаров <evgzak...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/51c6f97a-d541-4f06-bf4f-ad225723f761o%40googlegroups.com.

evgzak...@gmail.com

unread,

Aug 5, 2020, 3:34:32 AM8/5/20

to tesseract-ocr

Thanks, I will try it.

Evgeny

вторник, 4 августа 2020 г. в 20:18:54 UTC+3, zdenop:

Reply all

Reply to author

Forward