Tesseract extremly slow when processing this particular image

701 views
Skip to first unread message

Yeska

unread,
Apr 27, 2016, 7:29:53 AM4/27/16
to tesseract-ocr

Hi,


Tesseract takes up to 20 seconds to process this image, I know that colorful images with more than one column might be slower to process, but 20 seconds is too much.


Can I do something to make processing this image faster ?


I'm sending the raw image to Tesseract, I prefer not to preprocess the images because they are sent by the user so I can't be sure that my preprocessing will be good for all the cases, but if you have some "general" preprocessing ideas which will help in most of the cases I would be grateful.


Thank you,

fr_1_russe.jpg

Tom Morris

unread,
Apr 27, 2016, 1:57:58 PM4/27/16
to tesseract-ocr
The more "stuff" (<-- technical term) there is in an image, the more time it's going to take to process. You could do some simple manual testing with a photo editor, whiting out various parts of the page, to see what causing the increased processing time or you could profile Tesseract to see where it's spending its time. You could also ask Tess to dump the thresholded image so that you can see what it's actually working with.

My first suspicion would the bank note engraving with all it's high frequency noise. My second guess would be the clip art on the left or the gradients top and bottom, but those are just guesses.

If you don't have any additional domain knowledge that you can apply to the image pre-processing for your particular application, you may need to live with Tesseracts pre-processing (or attempt to improve it's general algorithms for your case without degrading other, more common cases).

Tom
Reply all
Reply to author
Forward
0 new messages