strange hack that improves OCR on numbers

60 views
Skip to first unread message

simon mackenzie

unread,
Aug 27, 2017, 1:17:16 PM8/27/17
to tesseract-ocr
I have columns like this
     356  Smith  23            123 Jones    12
     123  Jacks  19            124 Barnes  10

Wordboxes are correctly identified for all names/numbers for some pages. However on other pages there are numerous missing boxes for columns of numbers especially the last column though the print is crystal clear. There is no obvious reason why it works sometimes and not others. Is there any way to improve performance?

One strange hack that seems to work on some pages is I paste in a column of "99\n"*10 at 0,0. Then it correctly puts boxes round the final column of numbers! Does anyone have an explanation for this and can it be extended to work on all pages?


Reply all
Reply to author
Forward
0 new messages