Newbie: wondering why a fairly crisp document has such low accuracy

57 views
Skip to first unread message

Stephen Boesch

unread,
Aug 12, 2017, 2:54:22 PM8/12/17
to tesseract-ocr
I printed out the "Welcome" page on my HP laserjet printer and scanned it in using .png .  The quality is quite good. So I had been  anticipating maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally carefullly - but by eyeballing it seems about  50%.    I had used all default settings.

Some of the consistent errors:

W -> H
in -> m
li -> h
b -> t)
ll -> H

So is this just "the way things are" in OCR land?  Or am I missing some fundamental settings here - to get some reasonable usefulness?

thanks

stephenb

ShreeDevi Kumar

unread,
Aug 12, 2017, 3:00:31 PM8/12/17
to tesser...@googlegroups.com
With English you should probably get close to 99% accuracy.

Is your png at 300 dpi?

Which version of tesseract did you use?
Which traineddata?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages