Introduction

5 views
Skip to first unread message

LeoHeska

unread,
Oct 8, 2009, 8:53:12 AM10/8/09
to ocropus
Hi.

"Please use your first message to introduce yourself and your interest
in the project and be patient if it doesn't get posted right away."

My name is Leo Heska. I'm interested in OCRopus primarily for just
part of its functionality - layout analysis.

I have a large body of images (primarily in JPEG2000) that I want to
break up into their component parts. That is, take a 3-column page
(think dictionary or phone book listing), and output both individual
columns, and individual lines. I want to do what the ReCaptcha folks
do - break a large image of text (no graphics) into individual lines,
or maybe even words.

The actual OCR part will be interesting to me if it works well, but
that is not my primary interest.

Also, I'm interested in compiling on Windows. I've used linux and can
use it again, but 90% of the non-academic world runs on Windows, and
that's the world I live and work in. Potentially I could work as part
of a build team, testing releases and providing Windows executables.

Leo

Tom Breuel

unread,
Oct 19, 2009, 6:14:09 PM10/19/09
to ocropus

> I have a large body of images (primarily in JPEG2000) that I want to
> break up into their component parts.  That is, take a 3-column page
> (think dictionary or phone book listing), and output both individual
> columns, and individual lines.

That's pretty easy to do with OCRopus, both from the command line and
programmatically. Look at the ISegmentPage interface and the "ocropus
pageseg" command.

The format is documented on ocropus.org at File Formats -- file
formats used by OCRopus

https://docs.google.com/View?id=dfxcv4vc_92c8xxp7

Tom
Reply all
Reply to author
Forward
0 new messages