Output cropped words

58 views
Skip to first unread message

Nathan Cain

unread,
Sep 27, 2015, 12:38:58 PM9/27/15
to tesseract-ocr
I have a project similar to recaptcha where I need humans to type words instead of computer ocr. Is there a way for tesseract to split an image into words and output the words as separate image files?

Ryan Baumann

unread,
Sep 28, 2015, 2:04:39 PM9/28/15
to tesseract-ocr
Hi Nathan,

I adapted the Tesseract API examples (https://code.google.com/p/tesseract-ocr/wiki/APIExample) to do this for saving line images to feed into another OCR program: https://github.com/ryanfb/tesslinesplit
If you change RIL_TEXTLINE to RIL_WORD in tesslinesplit.cpp and compile, it should work for splitting an image into words.

Best,
-Ryan
Reply all
Reply to author
Forward
0 new messages