I have attached the image as well as Tesseract OCR result for attached image screen shot. the below OCR some words are missing from OCR how can i improve the image quality to detect the missing words.The attached image DPI areHorizontal resolution - 204 DPIVertical resolution - 98 DPIPlease help me to improve the OCR accuracy.
Hi Guna,
I usually find that tesseract has trouble with text on lines in a form, there is a horizontal line removal example included with leptonica that might help you [1]. I tried it on the sample you provided, and doubled the size of the image to start zeroing in on the results. You might also consider font training for characters that would be impacted by removing the line (since it can take the bottom part of the letter away if the text is typed right on the line).
art
---
Error! Filename not specified.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To post to this group, send email to
tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/e725e8e6-dd6f-4c4c-9bb9-61f86c49053c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I have increased the DPI also but some word are missing attached output image.I have attached the image properties. the file compression type CCITT and bit depth is 1.Does compression type and bit depth is depended on OCR process?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c540e3e5-e832-4426-afe0-2c13f78d4c74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.