I am training tesseract to recognize CMC7 font, following this and this tutorial.
I have made a .tif file with 2621 characters, and created the .box file, going into every character to make sure the X and Y positions are correct (the rectangle around the character).
After that, I have run the command to train tesseract:
tesseract por.cmc7.exp0.tif por.cmc7.box nobatch box.train .stderr
I've made a shell script that calls this command in a loop, so the training wil be repeated a bunch of times. However, after a bunch of:
APLY_BOXES: Unlabelled word at :Bounding box=(762,2763)->(783,2776)
APPLY_BOXES: Unlabelled word at :Bounding box=(774,2269)->(783,2277)
APPLY_BOXES: Unlabelled word at :Bounding box=(787,2269)->(789,2277) ...
The result is always:
Found 420 good blobs.
2129 remaining unlabelled words deleted.
Generated training data for 420 words
It is running for several hours, and still it generated training data for only 420 words. And after I run tesseract on a check image to test it will recognize the characters, it doesn't work (doesn't recognize the characters and return random letters and symbols).
How can I make it recognize all the characters in the .tif image?
Thank you.
I have attached the .box and .tif in the zip file.