Hi all
I am working with the Urdu OCR. I came to know about Tesseract. I tried to train tesseract for the Urdu characters. In the training procedure's instruction , it is written that it cannot support the right to left writing style. I myself tried to training the simple alphabets of Urdu as follows:
1 I made the characters txt file with name UrduCharacters.txt with utf8 encoding
2. Then from it TIF image is obtained and saved as UrduCharacters.tif
3 Run the tesseract command to makebox file
1 tesseract UrduCharacters.tif UrduCharacters batch.nochop makebox
2 tesseract UrduCharacters.tif UrduCharacters -l urd batch.nochop makebox
I have tried the both the commands for training . In the second one the error occurs indicating the message that "Unable to locate Urdunichaset file"
In the second one the boxfile is generated with four character which are ~, 7,7,! . If anyone has any idea about it please let me know.
Regards
Ainie