Training Tesseract to recognize technical drawings

564 views
Skip to first unread message

Bagi Alexandru

unread,
Feb 10, 2017, 1:26:13 PM2/10/17
to tesseract-ocr
Hello guys, I want to train tesseract to recognize numbers and symbols on some technical drawings. The problem I face is that the numbers/characters/symbols appear in various angles, they appear normally, then -45 degrees, then +45 degrees, +90 degrees, -90 degrees. Those numbers are approximations, because with every drawing, the inclination of characters differ. I was thinking about rotating the image a couple of times, scanning it in every angle. But the issue is that I'll get too many false positives. How should I approach this issue? 

Here is a sample : 
Image result for cad drawings with dimensions

I'm mostly looking for numbers, and a couple of other symbols ( ±,Ø,Φ and other 2-3 custom ones that I'm trying to figure out how to implement ), paranthesis and letters, uppercase and lowercase. I've been thinking to draw custom symbols in FontLab. Can I get any directions? Thank you very much!

Bagi Alexandru

unread,
Feb 11, 2017, 1:05:18 PM2/11/17
to tesseract-ocr
Anyone any ideas? :D

Bagi Alexandru

unread,
Feb 22, 2017, 10:29:12 AM2/22/17
to tesseract-ocr
Nothing yet?

v-room

unread,
Mar 9, 2017, 10:05:08 AM3/9/17
to tesseract-ocr
sure. let it rotate itself

On Saturday, February 11, 2017 at 11:35:18 PM UTC+5:30, Bagi Alexandru wrote:
Anyone any ideas? :D

Bagi Alexandru

unread,
Mar 9, 2017, 10:16:04 AM3/9/17
to tesseract-ocr
I got a lot of false positives, and most rotated characters aren't being recognized at all, or aren't recognized properly.

Milan Troller

unread,
Mar 10, 2017, 6:47:31 AM3/10/17
to tesseract-ocr
I would hazard to guess this is a pretty Tough Problem and will need a little bit more in depth approach than Tesseract's current scope.

Something like a separate text-detector well generalized to finding text in local area that would be robust against the angle variation finding the text, figuring out rotation, and only the passing it on to the actual OCR.

I really doubt you will have a lot of luck with naive approaches like rotating it and scanning repeatedly.
Reply all
Reply to author
Forward
0 new messages