Document Structure Analysis Only using Python

85 views
Skip to first unread message

phmulin

unread,
Jul 6, 2011, 11:53:58 PM7/6/11
to ocropus
Hi!

I am new to OCRopus and have been trying to work my way through it but
still have quite a bit of difficulties making sense of how I could
implement a system for document structure analysis & text line
recognition (without actually ocr-ing the content) using OCRopus with
python.

What I eventually would like to have is a system which I provide a
picture and I receive the coordinates or boxes or actual croped
pictures that contain text.

I have figured out that I should probably use the RAST algorithms but
I have no idea how I can tackle the problem. Would appreciate if you
could point me to some example code or a tutorial.

Thanks

Tom

unread,
Jul 12, 2011, 1:32:44 AM7/12/11
to ocr...@googlegroups.com
You run the command line tools up to ocropus-pseg, and then you look in the *.pseg.png files.

The directory layout is described here:


The file formats are described here:


You can easily get the line bounding boxes from the *.pseg.png file
Reply all
Reply to author
Forward
0 new messages