Here are some tips for total beginners who wants to do OCR on natural scene text.
First: The scene text detection
Cutting the detected text area to the ocr engine rather than putting the whole picture, it save a lot of time and resource.
If you have OpenCV plus Contrib modules, "Class-specific Extremal Regions" in the "text" module will come in handy.
If you want something really quick and don't want to compile the contrib modules yourself, This method will do the job.
Second: The image binerization before OCRing
Turn the text area image into black-and-white pattern will improve the output of the ocr, especially on natural scenes.
The tesseract engine do have its binerization function but I found it to be too vulnerable to the noise from the natural scene.
You can try:01 the adaptive thresholding
02 a special binerization method from here. -----------------
If you have something better and smarter than the above methods, please do post it.
I think it will benefit more people who's working in this field.