Extract Rotated/Tilted Text from Scanned Image

Merv

unread,

Jul 31, 2015, 3:26:16 AM7/31/15

to tesseract-ocr

I am currently planning to develop an OCR application using the Tesseract OCR Engine and fairly I have minimal knowledge using it.

The task I am facing is extraction of rotated text (any angle) from a scanned image. Kindly find the link : http://1drv.ms/1OS8elW which has the sample PDF document that I need to OCR. The sample contains a blue sticker on it and I need your guidance on extracting the text (printed text) from it.

As you can see the orientation is not stable and i would be really grateful if you could suggest a manner in which i could extract the text.

Any other alternate method suggestion is highly appreciated.

Tom Morris

unread,

Jul 31, 2015, 11:18:03 AM7/31/15

to tesseract-ocr, mervind...@gmail.com

On Friday, July 31, 2015 at 3:26:16 AM UTC-4, Merv wrote:

The task I am facing is extraction of rotated text (any angle) from a scanned image. Kindly find the link : http://1drv.ms/1OS8elW which has the sample PDF document that I need to OCR. The sample contains a blue sticker on it and I need your guidance on extracting the text (printed text) from it.

As you can see the orientation is not stable and i would be really grateful if you could suggest a manner in which i could extract the text.

Is the sticker always blue? If so, it should be pretty easy to use image processing to identify its location, extract if from the background document it, square it up, and drop the blue background. You can then OCR the resulting image.

Tom

Merv

unread,

Aug 1, 2015, 5:20:02 AM8/1/15

to tesseract-ocr

Yes, the color of the sticker will always remain blue, only difference will be seen the shade of each blue, some values may differ. Tom, could kindly suggest me the way i should proceed.

Reply all

Reply to author

Forward