Fine-tuning a trained data in arabic.

185 views
Skip to first unread message

Wolf Assi

unread,
Mar 10, 2022, 3:35:00 AM3/10/22
to tesseract-ocr
I have noticed that the "ara-Scheherazade" trained data was trained for the "Traditional Arabic" font. I have tried it, it performs well but with low accuracy, and has a problem when it comes to arabic numerals as the numbers are inverted. I want to fix the issue. I have tried to fine-tune it for it to better suit my data, but the fine-tuning is not working as it is also mentioned in the documentation that  in order to fine-tune, I need to use the trained data found in the tess_data best repo.
The main aim I'm trying to achieve is to manage to recognize both arabic letters and numbers. I know that there is a small issue with tesseract concerning both arabic letters and numbers, but the fact that the "ara-Scheherazade" font manages to recognize both but with a low accuracy means that it can be done, and I want to try and make it better. So does anyone know what can I do??
Reply all
Reply to author
Forward
0 new messages