Customize TIFF file with OpenCV filters.

34 views
Skip to first unread message

André Castro

unread,
Jan 29, 2020, 3:01:16 PM1/29/20
to tesseract-ocr
Dear, all.

First, I'd like to thank you for maintaining the Tesseract community alive. Second I'd like to share some questions about the training process I am using.

Following the steps in the tutorial, I was able to create the box/tiff pairs and lstmf files with a ttf font file. The problem I had was the recognition was barely adequate for the font provided. I realized the data I was testing was corrupted with lots of pixels missing after applying the filters for text segmentation. Is there any way to use OpenCV or any filter for deteriorating the tiff file? I could see the Tesseract includes extra pixels in the borders of the characters. Is there any parameter to remove instead of adding? 

Thank you so much!

Training tiff File contains:

Screenshot from 2020-01-29 16-48-30.png





Image after processing contains:




Thad Guidry

unread,
Jan 29, 2020, 5:31:38 PM1/29/20
to tesser...@googlegroups.com
What about Dilate and Erode in OpenCV ?

I mention my experiments here on the Wiki (which includes a link about Dilation and Erosion algorithms in general used in lots of image processing software):  https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#dilation-and-erosion



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b1962b86-4963-4020-9182-2d28e78162e6%40googlegroups.com.

André Castro

unread,
Jan 29, 2020, 5:40:17 PM1/29/20
to tesser...@googlegroups.com
Thanks for the reply, Thad.

I must have made myself unclear. My goal is to apply filters on the TIFF file (generated automatically) for training tesseract. 

The code bellow generates this TIFF (with multiple pages) and other box files.

Screenshot from 2020-01-29 19-37-11.png

This generates the following files:

Screenshot from 2020-01-29 19-37-48.png

The goal is to erode and dilate the por.crlv.exp0.tif

Thanks again!



Reply all
Reply to author
Forward
0 new messages