Newbie: wondering why a fairly crisp document has such low accuracy

57 views

Skip to first unread message

Stephen Boesch

unread,

Aug 12, 2017, 2:54:22 PM8/12/17

to tesseract-ocr

I printed out the "Welcome" page on my HP laserjet printer and scanned it in using .png . The quality is quite good. So I had been anticipating maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally carefullly - but by eyeballing it seems about 50%. I had used all default settings.

Some of the consistent errors:

W -> H

in -> m

li -> h

b -> t)

ll -> H

So is this just "the way things are" in OCR land? Or am I missing some fundamental settings here - to get some reasonable usefulness?

thanks

stephenb

ShreeDevi Kumar

unread,

Aug 12, 2017, 3:00:31 PM8/12/17

to tesser...@googlegroups.com

With English you should probably get close to 99% accuracy.

Is your png at 300 dpi?

Which version of tesseract did you use?

Which traineddata?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages