Help with blurred OCR but "simple text"

Javier Abascal

unread,

Apr 6, 2017, 9:19:45 AM4/6/17

to tesseract-ocr

Hi everyone! :)

I am having troubles identifying correctly the text in the images attached. In my opinion, they are quite clear but not sure how to help Tesseract to identify them. I have tried some other OCR Online services and they seem to identify them correctly (without any configuration) so I believe I can handle these images with Tesseract. The reason is that I won't have Internet access in the machine that will run this task

For now, I have tried to use several of the "top" Tesseract tune parameters (like PSM, dictionary, language, increasing DPI, etc.) but I haven't been successful yet. Could you please help me with this?

Thank you very much in advance, I really would appreciate any type of comments :)

example_ocr_1.jpg

example_ocr_2.jpg

example_ocr_3.jpg

Allistair C

unread,

Apr 6, 2017, 12:35:36 PM4/6/17

to tesser...@googlegroups.com

You might want to try preprocessing with a threshold filter (otsu threshold) to harden the edges?

Sent from my iPhone

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/861cd975-a1da-4342-891f-325ae5d7f947%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<example_ocr_1.jpg>

<example_ocr_2.jpg>

<example_ocr_3.jpg>

Javier Abascal

unread,

Apr 11, 2017, 3:16:10 AM4/11/17

to tesseract-ocr

Hi,

I have tried to use a otsu threshold and It didn't work very well. I am still not being able to recognize the word Carolline for example. Here is the code I used for it.

Any other ideas people? :):)

from PIL import Image
img = Image.open("example_ocr_1.jpg").convert('L')
img_array = np.asarray(img)
print(img_array)
otsu_threshold = filters.threshold_otsu(img_array)
print(val)

def otsu_filter(x):
    if x < otsu_threshold:
        return 0
    else:
        return 255
        
otsu_filter = np.vectorize(otsu_filter)
img_otsu = otsu_filter(img_array)
img_otsu = Image.fromarray(np.uint8(img_otsu))
img_otsu.show()
img_otsu.save("example_ocr_1_otsu.jpg")

example_ocr_1_otsu.jpg

Reply all

Reply to author

Forward