Training from actual user feedback

38 views
Skip to first unread message

Fred Beer

unread,
Apr 30, 2014, 11:26:37 AM4/30/14
to tesser...@googlegroups.com
I have a problem where I have forms with many fields on them (all in the same place) that need to be scanned and interpreted by Tesseract. My idea is to draw rectangles around each "field" and then grab that image and feed it to Tesseract. Then if the confidence is high give that as the translation and if it is low feed it to a human to manually correct. The question I have is can I then feed that back to Tesseract to train it to improve in the future (the image of the text and the actual text)? This way the training of the system comes from actual usage of it rather then going through a separate training exercise.

We are having a disagreement internally on how to do this. I read about the Make Box Files which seems promising. However one of our team says that won't work? Could someone help point us in the right direction and if this is possible. Any help is appreciated. Thanks.
Reply all
Reply to author
Forward
0 new messages