image_to_string and image_to_data results are not the same

45 views
Skip to first unread message

Alan Kong

unread,
Feb 11, 2018, 7:25:58 AM2/11/18
to tesseract-ocr
Hi everyone, 

I am a new user to tesseract-ocr and had been using it on python with pytesseract wrapper.

On the pytesseract, I am able to call to function 1) image_to_string which translate character it recognize to text string in a python list and 2) image_to_data which translate character to string, + verbose information where it includes all the bounding boxes coordinates and confidence of the prediction.

I had used these 2 function and would expect them to actually return the same result but they differ a lot. I was thinking maybe image_to_data uses -psm 0 by hard default and this parameters cannot be change. Where as in image_to_string, I could set -psm 6 which return fairly reasonable results.

Cheers,

Alan
Reply all
Reply to author
Forward
0 new messages