image_to_string and image_to_data results are not the same

45 views

Skip to first unread message

Alan Kong

unread,

Feb 11, 2018, 7:25:58 AM2/11/18

to tesseract-ocr

Hi everyone,

I am a new user to tesseract-ocr and had been using it on python with pytesseract wrapper.

On the pytesseract, I am able to call to function 1) image_to_string which translate character it recognize to text string in a python list and 2) image_to_data which translate character to string, + verbose information where it includes all the bounding boxes coordinates and confidence of the prediction.

I had used these 2 function and would expect them to actually return the same result but they differ a lot. I was thinking maybe image_to_data uses -psm 0 by hard default and this parameters cannot be change. Where as in image_to_string, I could set -psm 6 which return fairly reasonable results.

Cheers,

Alan

Reply all

Reply to author

Forward

0 new messages