tesseract unable to detect characters in simple two-word image

876 views
Skip to first unread message

Rory MacQueen

unread,
Jan 5, 2020, 1:29:42 AM1/5/20
to tesseract-ocr

I'm having trouble getting tesseract to recognize any characters in the following image:


tessinput


When I run tesseract from the command line on this image, I get "Empty page!!" - that is, no results - returned. Based on my reading of the Improving Quality section of the wiki, I thought that the issue might be that the words in this image are not dictionary words. With that in mind, I have tried both disabling the tesseract dictionaries altogether (using the load_system_dawg and load_freq_dawg config flags) as well as augmenting the existing dictionary with these additional words (LAO and CAUD). Neither of those approaches worked. I have tried tesseract versions 3, 4, and have built version 5 from source on a Mac computer. All have given the same result.


Curiously, if I type the exact words from that image into a word processor and take a screenshot, it works: the resulting image is readable by tesseract. It correctly parses each character. Here is that image:

Screen Shot 2020-01-04 at 7 01 11 PM

The only difference between the two images is that the first one is of a slightly lower resolution/quality. Am I then to believe that tesseract is unable to recognize characters in a slightly inferior quality image like that? Is there anything I can do to improve that image quality? Is there something else I'm missing?


Thanks in advance.


-Rory

Shree Devi Kumar

unread,
Jan 5, 2020, 1:52:53 AM1/5/20
to tesseract-ocr
try --psm 6

ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 197
Empty page!!
Estimating resolution as 197
Empty page!!
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --dpi 300
Empty page!!
Empty page!!
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --dpi 300 --psm 6
LAO 7° f CAUD 8°
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --psm 6
Warning: Invalid resolution 0 dpi. Using 70 instead.
LAO 7° f CAUD 8°
ubuntu@tesseract-ocr:~/TEST$

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/980a7d52-9343-46a5-a417-f6b01cb711da%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages