How can define character size for tesseract?

650 views
Skip to first unread message

ali madad

unread,
Aug 21, 2018, 5:25:05 AM8/21/18
to tesseract-ocr
Dear researchers,

when I use tesseract for character recognition, there exist so many wrong results which the character width or height is about 4 pixels.

Is there any way to restrict the tesseract recognition by a range of character size?

thank you very much for your time.
Best Regards

vishnu thampan

unread,
Sep 18, 2018, 5:47:44 AM9/18/18
to tesseract-ocr
From your problem, it seems to be like you are trying to extract text from small image, like an image that contains very less characters or may be a single word. For improving the accuracy, you can use the PSM(Page Segmentation Mode) option. Tesseract comes with 12 different psm modes. The usage can be as:
tesseract.image_to_string(image,lang='eng', config='--psm 8')
psm 8: Consider the image as a single word in it
psm 7: Consider the image that has a single line in it
By default, tesseract uses psm 3
For more information: Go here
Reply all
Reply to author
Forward
0 new messages