Can ocropus give coordinates of bounding rectangle of scanned page?

112 views
Skip to first unread message

jimfunderburk

unread,
Oct 30, 2008, 9:54:26 PM10/30/08
to ocropus
I am a potential ocropus user. Based on a lecture by Breuel at a
Sanskrit symposium in May 2008, and from what I've seen in ocropus
wiki, I suspect that ocropus can solve the problem described below.
But for me it is a non-trivial task to get a ubuntu computer, install
ocropus, etc. etc., so I am hoping that the experts of this group will
be able to say "Sure, ocropus can do that!", before I proceed further.

The project is to look up a word in scans of the pages of the Wilson
Sanskrit dictionary, and highlight on the scanned image of the
relevant page the part pertaining to the word.

You can see the current state of this for the Wilson dictionary at
http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/web/index.php
If you enter 'azva', the page for this word is retrieved, and the
part of the page containing the word is emphasized.
For this word, 'azva' the process is quite satisfactory.
However, if you try the word 'rAma' or 'sItA', for instance, you see
that the region highlighted is not quite right.
The main problem is that the position of the page within the whole
scanned image varies, due in part to vagaries of
the scanning process.

Here is where I thought OCROPUS might come in usefully: to
determine the pixel coordinates of the'bounding rectangle' of
the text. A table of such information for each page could be fed
into some other program, possibly such as imageMagick,
to automate the 'normalization' of the image within the page.

Thanks for any suggestions.

Thomas Breuel

unread,
Nov 4, 2008, 3:48:24 PM11/4/08
to ocr...@googlegroups.com
You can compute the bounding box as the bounding box of the text lines.  That will probably give you a fairly reasonable page bounding box for most pages.

We have separately developed another page bounding box detector that we will be incorporating into OCRopus over the next few months; that detector detects the page boundary directly.

Tom

Natraj Kumar

unread,
Dec 11, 2013, 6:01:57 AM12/11/13
to ocr...@googlegroups.com
Tom

Has the bounding box detector been added to OCROpus? I downloaded the latest version and couldn't get the word coordinates. Hence I request your help

Thanks in advance

Natraj
Reply all
Reply to author
Forward
0 new messages