Assistance With segmented display OCR

rob...@inkrh.com

unread,

Feb 7, 2017, 2:35:40 PM2/7/17

to tesseract-ocr

Hello,

I am running into real difficulty getting Tesseract to work with a faux segmented display.

At the moment I am processing a video frame by frame, removing the background and replacing the character's color to leave images like the attached.

I have spent a long time training both using a set of the actual images produced by the above, as well as using tifs of LCD fonts that match, setting the expected format, whitelisting the characters expected and still am seeing no improvement in the recognition success - at best it is around 30% success, at worst 0%.

I have also tried using SSOCR (https://www.unix-ag.uni-kl.de/~auerswal/ssocr/) without any success (0 digits recognized from the above), and exploring all the different settings of tesseract and SSOCR.

Is there any advice for getting these characters recognized consistently? My target is to have the characters recognized with at least a 75% success rate.

360.png

1880.png

ShreeDevi Kumar

unread,

Feb 7, 2017, 10:27:29 PM2/7/17

to tesser...@googlegroups.com

Take a look at

http://stackoverflow.com/questions/17672705/text-detection-on-seven-segment-display-via-tesseract-ocr

https://github.com/arturaugusto/display_ocr

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/73f12a10-45d4-4879-9d62-456dd5dd3abf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Art Rhyno.

unread,

Feb 8, 2017, 8:16:58 AM2/8/17

to tesser...@googlegroups.com

The gaps in some of the characters are probably too significant for tesseract to identify them properly. I'd be tempted to try to leverage the parts of the characters where the segments are connected and infer the numbers from the positioning, for example, train for the top and bottom of the zero as separate characters and then identify the zero when one is over the other. I wonder if something like opencv would be a better tool in this case.

art

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

rob...@inkrh.com

unread,

Feb 8, 2017, 11:28:51 AM2/8/17

to tesseract-ocr

Thanks Shree,

I have already tried using the letsgodigital trained data, but am re-reading the SO post to see if I missed anything.

On Tuesday, February 7, 2017 at 9:27:29 PM UTC-6, shree wrote:

Take a look at

http://stackoverflow.com/questions/17672705/text-detection-on-seven-segment-display-via-tesseract-ocr

https://github.com/arturaugusto/display_ocr

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Feb 8, 2017 at 12:20 AM, <rob...@inkrh.com> wrote:

Hello,

I am running into real difficulty getting Tesseract to work with a faux segmented display.

At the moment I am processing a video frame by frame, removing the background and replacing the character's color to leave images like the attached.

I have spent a long time training both using a set of the actual images produced by the above, as well as using tifs of LCD fonts that match, setting the expected format, whitelisting the characters expected and still am seeing no improvement in the recognition success - at best it is around 30% success, at worst 0%.

I have also tried using SSOCR (https://www.unix-ag.uni-kl.de/~auerswal/ssocr/) without any success (0 digits recognized from the above), and exploring all the different settings of tesseract and SSOCR.

Is there any advice for getting these characters recognized consistently? My target is to have the characters recognized with at least a 75% success rate.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

rob...@inkrh.com

unread,

Feb 8, 2017, 11:33:42 AM2/8/17

to tesseract-ocr

So if I have this right you're saying it could be an idea to train tesseract with the partial character segments individually, then just set it for a combination of the segments? I had not even realized that partial characters was a possibility.

For the suggestion of opencv - would that be more using shape recognition, again with the segments of the partial character and then just a script to say "I have yada yada segments in yada yada position this must be a ..."?

Art Rhyno.

unread,

Feb 8, 2017, 1:00:11 PM2/8/17

to tesser...@googlegroups.com

It’s worth trying, a “character” is typically a set of connected line segments, in fact, I think LED display tools typically try to close the gap between the segments. When you do font training, tesseract’s tools tell you how it interprets the characters. I would be tempted to take the “0” from the image and see if the “makebox” step identifies one or two boxes. Opencv would indeed be more of a shape recognition exercise, specifically template matching, there’s a nice example here [1]. I have found template matching is worthwhile for a limited set of characters/symbols, that’s why a number sequence might be a candidate, but a lot depends on the consistency of the display. I tried something like this for a handwritten set of diaries, but the variations in letters pushed me towards OpenIMAJ [2].

art

---

1. http://www.pyimagesearch.com/2015/01/26/multi-scale-template-matching-using-python-opencv/

2. http://openimaj.org/

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a438816d-238f-4201-88de-1a61190a0a2f%40googlegroups.com.

rob...@inkrh.com

unread,

Feb 8, 2017, 6:09:55 PM2/8/17

to tesseract-ocr

Thanks Art,

Since the possible characters (0-9 with : and .) and the possible formats are limited (n:nn.nn) then it definitely makes sense to use OpenCV for shape recognition as an alternative. I just thought of another avenue too - since the position of each segment is constant I could literally just pick a pixel in the centre of each expected segment and check if it is black or white to flag that segment being on/off, then it'd simplify the problem to a case of checking against a list.

Reply all

Reply to author

Forward