Document Structure Analysis Only using Python

85 views

Skip to first unread message

phmulin

unread,

Jul 6, 2011, 11:53:58 PM7/6/11

to ocropus

Hi!

I am new to OCRopus and have been trying to work my way through it but
still have quite a bit of difficulties making sense of how I could
implement a system for document structure analysis & text line
recognition (without actually ocr-ing the content) using OCRopus with
python.

What I eventually would like to have is a system which I provide a
picture and I receive the coordinates or boxes or actual croped
pictures that contain text.

I have figured out that I should probably use the RAST algorithms but
I have no idea how I can tackle the problem. Would appreciate if you
could point me to some example code or a tutorial.

Thanks

Tom

unread,

Jul 12, 2011, 1:32:44 AM7/12/11

to ocr...@googlegroups.com

You run the command line tools up to ocropus-pseg, and then you look in the *.pseg.png files.

The directory layout is described here:

OCRopus Storage Layout

The file formats are described here:

OCRopus File Formats - Google Docs

You can easily get the line bounding boxes from the *.pseg.png file

Reply all

Reply to author

Forward

0 new messages