Sharing some tips on Scene text OCR

80 views

Skip to first unread message

unread,

Apr 27, 2016, 11:40:06 PM4/27/16

to tesseract-ocr

Here are some tips for total beginners who wants to do OCR on natural scene text.

First: The scene text detection

Cutting the detected text area to the ocr engine rather than putting the whole picture, it save a lot of time and resource.

If you have OpenCV plus Contrib modules, "Class-specific Extremal Regions" in the "text" module will come in handy.

If you want something really quick and don't want to compile the contrib modules yourself, This method will do the job.

Second: The image binerization before OCRing

Turn the text area image into black-and-white pattern will improve the output of the ocr, especially on natural scenes.

The tesseract engine do have its binerization function but I found it to be too vulnerable to the noise from the natural scene.

You can try:01 the adaptive thresholding

02 a special binerization method from here.

-----------------

If you have something better and smarter than the above methods, please do post it.