I am having issues getting tesseract to recognise a column of numbers in what I naively assume should be a straightforward problem. Most of the issues come from a mis-recognition of the decimal point - it either skips it, or mistakes it for a number. I call tesseract 4.1.1 with the options " -c tessedit_char_whitelist=-.0123456789 --psm 4 -l eng --oem 2" and I am interested to get a column of numbers in tabular form. After pre-processing my image, I have something of the sort:
which is then recognised as:
2.565
2597
2.614
2528
2.441
2564
2.530
24479
2.601
2.601
2.569
24555
2.437
2.531
2.592
2.385
2.618
2.738
2.766
24473
2.624
2.611
2.749
2.730
I can't afford to skip decimal points and there is no fixed pattern where the decimal points are (so can't skip "." nor "-" from the list of allowed characters). Can someone advise whether this is a pre-processing or tesseract issue and how I could improve OCR here?
Thanks