Extracting text from digital display

67 views

Skip to first unread message

Berend Berendsen

unread,

Nov 22, 2015, 8:01:14 AM11/22/15

to tesseract-ocr

I am trying to extract text from a digital display (not seven segment). The use case is that there will be a camera pointed at the display taking a picture every X seconds which has to be processed. An example of a display is:

There are three segments I am interested in, which I cut out of the image before giving it to Tesseract:

1) the number behind No.

2) The number behind Total

3) The number at the right side of the display

Extracting the images and then preprocessing them (grayscale, invert, change contrast) and psm mode 6 with digits only works wel for 1) and 3). However 2 seems to be a challenge. I think it is because of the font which causes Tesseract to see disjointed characters. I am wondering if I am not overshooting the problem, because the images will be of fixed size, fixed locations for the areas I am interested in - would pattern matching work better?

I can train Tesseract on the font of 2) or has someone has any suggestions on what would be the best plan of attack for this?

Cut out version of 2):

Thanks and regards!

berend

Reply all

Reply to author

Forward

0 new messages