OCRopus newbie
unread,Jan 14, 2013, 8:21:03 PM1/14/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ocr...@googlegroups.com
Hi there,
I would like to split up books of pages of hand-written text into words. No OCR should be attempted.
The idea is as follows:
1) Split up scanned page into first line, then word images, maintain relationship word <-> page.
2) Possibly discard some of the word images based on some criteria
3) Use some algorithm to sort the word images by "similarity". Ideally, similar words would end up close to each other.
Use all of this to create an index of the book.
Is this something OCRopus can be useful for?
I've tried OCRopus to to the first part. It works well on the line part, but then goes directly to characters, there is no step of words.
Thanks for any input!
Cheers,
Gerhard