Customize TIFF file with OpenCV filters.

André Castro

unread,

Jan 29, 2020, 3:01:16 PM1/29/20

to tesseract-ocr

Dear, all.

First, I'd like to thank you for maintaining the Tesseract community alive. Second I'd like to share some questions about the training process I am using.

Following the steps in the tutorial, I was able to create the box/tiff pairs and lstmf files with a ttf font file. The problem I had was the recognition was barely adequate for the font provided. I realized the data I was testing was corrupted with lots of pixels missing after applying the filters for text segmentation. Is there any way to use OpenCV or any filter for deteriorating the tiff file? I could see the Tesseract includes extra pixels in the borders of the characters. Is there any parameter to remove instead of adding?

Thank you so much!

Training tiff File contains:

Image after processing contains:

Thad Guidry

unread,

Jan 29, 2020, 5:31:38 PM1/29/20

to tesser...@googlegroups.com

What about Dilate and Erode in OpenCV ?

https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#dilate

I mention my experiments here on the Wiki (which includes a link about Dilation and Erosion algorithms in general used in lots of image processing software): https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#dilation-and-erosion

Thad

https://www.linkedin.com/in/thadguidry/

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b1962b86-4963-4020-9182-2d28e78162e6%40googlegroups.com.

André Castro

unread,

Jan 29, 2020, 5:40:17 PM1/29/20

to tesser...@googlegroups.com

Thanks for the reply, Thad.

I must have made myself unclear. My goal is to apply filters on the TIFF file (generated automatically) for training tesseract.

The code bellow generates this TIFF (with multiple pages) and other box files.

This generates the following files:

The goal is to erode and dilate the por.crlv.exp0.tif.

Thanks again!

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAChbWaMvCbMe%3DjgUf1Ho-z_GBcb5J%3DSOV4y_5-FqRuZWSOwnPA%40mail.gmail.com.

Reply all

Reply to author

Forward