I am trying to recognize digits only.
B.jpg is the picture taken by a mobile phone camera.
Then I process the image with ImageMagick:
convert B.jpg -threshold 30% B_thres_30.tiff
convert B.jpg -threshold 40% B_thres_40.tiff
For OCR, I use the commands:
tesseract B.tiff B -psm 8 nobatch digits
tesseract B_thres_30.tiff B_thres_30 -psm 8 nobatch digits
tesseract B_thres_40.tiff B_thres_40 -psm 8 nobatch digits
Now, without manual preprocessing, tesseract outputs an empty file. I find it strange because tesseract should perform some kind of thresholding.
The only explanation for me is that the thresholding of tesseract might not be that good.
With a 30% threshold, the output text of tesseract is: 5-50 (obviously this threshold is suboptimal)
With a 40% threshold, the output text of tesseract is: 5.50 (optimal threshold)
Any clues why tesseract's own thresholding is not working well?
Thanks,