How to get check page orientaion

1,668 views
Skip to first unread message

Gunasekaran Velu

unread,
Feb 29, 2016, 5:46:22 AM2/29/16
to tesseract-ocr
Hi

I have multiple page document some pages are normal page some pages or 90 degree rotated.

How i check that(90 degree) or how can i get the orientation value for particular page then only i can rotate the page for OCR process.

Looking forward your reply.


Regards
Guna

Tom Morris

unread,
Mar 1, 2016, 2:11:52 PM3/1/16
to tesseract-ocr
On Monday, February 29, 2016 at 5:46:22 AM UTC-5, Gunasekaran Velu wrote:

I have multiple page document some pages are normal page some pages or 90 degree rotated.

How i check that(90 degree) or how can i get the orientation value for particular page then only i can rotate the page for OCR process.

If you use -psm 1, Tesseract will attempt to figure out the page orientation. If you use -psm 0, it'll output just the orientation information for you, but, if it gets it correct, it should be able to use it itself without you having to rotate the pages yourself.

Tom

Gunasekaran Velu

unread,
Mar 3, 2016, 6:30:17 AM3/3/16
to tesseract-ocr

Hi Tom

Thanks for your information.

Now i am able to do OCR for 90 degree orientation image.

> tesseract.exe 20160226132734282-4.png 20160226132734282-4 -l eng -psm 1 hocr

But when i overlay the 20160226132734282-4.html to original pdf file for searchable pdf word not recognized.

Attached 20160226132734282-4.png image and corresponding html file and also searchable pdf(output pdf) and original pdf file.

How can i do the searchable pdf based on this png and html file. Its working for normal pages(like orientation of degree 0).

I am able to search the word in the normal pages like orientation of degree 0 but not in orientation of degree 90.

Please do the needful.


Regards
Guna
20160226132734282.pdf
20160226132734282-4.hocr.html
20160226132734282-4.png
20160226132734282-Original.pdf

Tom Morris

unread,
Mar 5, 2016, 10:49:02 AM3/5/16
to tesseract-ocr
It wouldn't surprise me if the PDF renderer can't handle documents with mixed page orientations.

Please create an issue in the issue tracker so that it gets looked at (and include your example files):


Tom

Gunasekaran Velu

unread,
Mar 6, 2016, 7:39:35 PM3/6/16
to tesseract-ocr
Thanks Tom.

I have fixed the issue.


Regards
Guna
Reply all
Reply to author
Forward
0 new messages