I have trained tesseract for Urdu image (which is a multipage tif image having 20 pages: C.tif(it is zipped in .rar)) and boxfile (C.box)
After training the data, i gave image Urdu4.tif for recognition. The output of the file is as outputC4.txt
In this file all the characters are not recognized. At position 2 the recognized id should be 665664 instead of 665663.
How is it possible to find out which characters are not recognized by Tesseract?