Sharing some tips on Scene text OCR

80 views
Skip to first unread message

Cid Chang

unread,
Apr 27, 2016, 11:40:06 PM4/27/16
to tesseract-ocr
Here are some tips for total beginners who wants to do OCR on natural scene text.

First: The scene text detection
        Cutting the detected text area to the ocr engine rather than putting the whole picture, it save a lot of time and resource.
        If you have OpenCV plus Contrib modules, "Class-specific Extremal Regions" in the "text" module will come in handy.
        If you want something really quick and don't want to compile the contrib modules yourself, This method will do the job.

Second: The image binerization before OCRing
       Turn the text area image into black-and-white pattern will improve the output of the ocr, especially on natural scenes.
       The tesseract engine do have its binerization function but I found it to be too vulnerable to the noise from the natural scene.
       You can try:01 the adaptive thresholding 
                         02 a special binerization method from here
-----------------
If you have something better and smarter than the above methods, please do post it. 
I think it will benefit more people who's working in this field.

Reply all
Reply to author
Forward
0 new messages