Z. Jay
unread,Jun 21, 2022, 1:25:33 PM6/21/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message as abuse
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
We have been using a competing OCR tool and are now evaluating a switch to tesseract. However, when converting a png, tesseract randomly - albeit rarely, returns characters where there is only white space. For example, tesseract will return a comma or equal sign where there is only white space. Scrutinizing the png I do not see anything such as dirt or a spec which looks like anything other than white space. While this is rare and random, it happens enough to be a problem. Note that this does not occur when using our current OCR tool. I suspect someone has encountered this issue before and already posted the solution somewhere on this list or elsewhere.
For reference, here is a comparison of the actual text and the text returned by tesseract:
Actual:
10/17 10/17, 0000 PAYMENT THANK YOU $64.79CR
Returned:
10/17, 10/17, 0000 =PAYMENT THANK YOU $64.79CR
Any pointers appreciated.
Thanks,
--zj