I printed out the "Welcome" page on my HP laserjet printer and scanned it in using .png . The quality is quite good. So I had been anticipating maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally carefullly - but by eyeballing it seems about 50%. I had used all default settings.
Some of the consistent errors:
W -> H
in -> m
li -> h
b -> t)
ll -> H
So is this just "the way things are" in OCR land? Or am I missing some fundamental settings here - to get some reasonable usefulness?
thanks
stephenb