Hi! I read through (
http://code.google.com/p/tesseract-ocr/wiki/
TrainingTesseract) but wanted to see if there's an easier option than
creating specific bounding boxes for each letter (which is what I
understand the tutorial says one needs to do?). Is there any option
where one would simply point to a TIF and TXT file, the TXT file
containing the correct text, and thus train Tesseract accordingly?
For instance, I'm currently getting a result like this one on an
image:
------------
Aprll 15 1953
Foober
------------
So I would like to change the text to
------------
April 15 1953
Foobar
------------
... for training purposes (guessing that Tesseract could take a try at
figuring out the bounding boxes itself as it did for the first
incorrect run?).
Thanks!