Handling underlines / skew

44 views
Skip to first unread message

viraf

unread,
Feb 9, 2016, 6:56:52 PM2/9/16
to tesseract-ocr
I am starting to use Tesseract to ocr scanned documents.  These documents comprise of forms, letters, and diagrams.

I have noticed that underlined text does not appear to be recognized.  Reading through the posting it is unclear whether it is supported for all fonts (it may be limited to fixed fonts).  If so, what is the best means by which to address underlines ?

Reading "An Overview of the Tesseract OCR Engine" it states that "The line finding algorithm is designed so that a skewed page can be recognized without having to deskew, thus saving loss of image quality".  However most of the posts identify deskew as a prerequisite preprocessing step.  Could someone please elaborate on the use cases where it is needed.

Thanks

-- viraf
Reply all
Reply to author
Forward
0 new messages