Determine Page Orientation

505 views
Skip to first unread message

Lincolin

unread,
Dec 3, 2008, 1:25:15 PM12/3/08
to tesseract-ocr
Guys,

Is there any way in Tesseract to determine the page orientation? you
know sometimes the user scan paper upside down and save it! so I need
to know the page orientation before calling the Recognize function in
order to get right results.

Any help will be greatly appreciated,
Lincolin

Graham Chiu

unread,
Dec 3, 2008, 1:32:01 PM12/3/08
to tesser...@googlegroups.com
What I used to do was check that the OCR gave me back words that
occurred in a most common words dictionary.
If not, I'd rotate the image at 90 deg intervals until I got a good result.
--
Graham Chiu
http://www.synapsedirect.com
Synapse - the use from anywhere EMR.

Lincolin

unread,
Dec 14, 2008, 6:19:31 AM12/14/08
to tesseract-ocr
Thanks a lot for your answer Graham! but what if the user-words
dictionary is empty, how would you compare the returned words to make
sure they are correct?

On Dec 3, 8:32 pm, "Graham Chiu" <compkar...@gmail.com> wrote:
> What I used to do was check that the OCR gave me back words that
> occurred in a most common words dictionary.
> If not, I'd rotate the image at 90 deg intervals until I got a good result.
>
> On Thu, Dec 4, 2008 at 7:25 AM, Lincolin <Lincoli...@hotmail.com> wrote:
>
> > Guys,
>
> > Is there any way in Tesseract to determine the page orientation? you
> > know sometimes the user scan paper upside down and save it! so I need
> > to know the page orientation before calling the Recognize function in
> > order to get right results.
>
> > Any help will be greatly appreciated,
> > Lincolin
>
> --
> Graham Chiuhttp://www.synapsedirect.com

lohith

unread,
Dec 15, 2008, 7:56:39 AM12/15/08
to tesseract-ocr
You can compare the words if they match the dictionary internal to
tesseract.
You can do this by using the following API in the tesseract library.

bool TessBaseAPI::isValidWord( const char* str );
( This function returns true if the given word "str" is in any of the
tesseract dictionaries )

Call this api for each of the word you got from tesseract, then count
the number of words that matched the dictionary.
( It's good if you compare the words which have more than 3 characters
in them )

The direction in which you get more number of dictionary words is the
correct orientation.

--lohith

Lincolin

unread,
Dec 17, 2008, 5:56:59 AM12/17/08
to tesseract-ocr
Thanks a lot lohith, I was trying to find a fast way to determine the
page orientation! I used to recognize the whole page and then rotate
it 180º and then recognize it again and compare the rejected count
number to determine the correct page orientation but this takes time
and if the user need to do this for many pages it will take a lot of
time!
Thanks anyway for your suggestion, this was a new information for me
anyway to check for valid words in Tesseract internal dictionary.
> > > Synapse - the use from anywhere EMR.- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages