Split up page into words

65 views
Skip to first unread message

OCRopus newbie

unread,
Jan 14, 2013, 8:21:03 PM1/14/13
to ocr...@googlegroups.com
Hi there,

I would like to split up books of pages of hand-written text into words. No OCR should be attempted.

The idea is as follows:

1) Split up scanned page into first line, then word images, maintain relationship word <-> page.

2) Possibly discard some of the word images based on some criteria

3) Use some algorithm to sort the word images by "similarity". Ideally, similar words would end up close to each other.

Use all of this to create an index of the book.

Is this something OCRopus can be useful for?

I've tried OCRopus to to the first part. It works well on the line part, but then goes directly to characters, there is no step of words.

Thanks for any input!

Cheers,
Gerhard



Tom

unread,
Apr 10, 2013, 1:42:00 AM4/10/13
to ocr...@googlegroups.com
Generally, splitting up into words without OCR is hard in English, and impossible in many other languages.

Your best bet is just to run the recognizer and look for the spaces.

Tom
Reply all
Reply to author
Forward
0 new messages