I am trying to fine-tune Tesseract for dot-matrix fonts such as that in the picture below. When the dots are closely spaced together and touch, Tesseract can more or less handle the dot-matrix font with some fine-tuning and image processing. However, when the dots do not touch, as in the picture below, Tesseract struggles.
I read in An Overview of the Tesseract OCR Engine that the first step in Tesseract's processing pipeline is a connected component analysis (second paragraph of Section 2). Since the letters in a dot-matrix font do not form connected components, I am wondering if Tesseract's connected component analysis may be one reason that Tesseract struggles on the image below.
Is there a command to see how Tesseract performs connected component analysis on this image?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fbbc3452-62f5-4c34-bd9c-72fa3a52c97c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com.
That's interesting that you tried replacing the top layer. I haven't tried that yet. How many iterations did you use?