Avestan OCR

61 views
Skip to first unread message

Seyedsoroush Hashemi

unread,
Jul 25, 2024, 9:55:00 AM7/25/24
to tesseract-ocr
Hey all,
We're considering training a model for Avestan OCR (and probably later a model for Pahlavi). Both of these are ancient Iranian languages with limited remaining text, which is being digitized by a few projects in academia. An OCR model can significantly speed up those projects and enable further analysis (e.g., author recognition).

We couldn't find any mention of Avestan in this Google group or in the Tesseract documentation. So, could you please answer the following questions:
1. Have there been any attempts/progress towards adding Avestan/Pahlavi OCR to Tesseract? If so, could you please share the result?
2. Is there anyone who wants to join us in this project?

Dariush Mazlumi

unread,
Aug 6, 2024, 2:53:18 PM8/6/24
to tesseract-ocr
hello Mr.Hashemi
as another Iranian here, I'd like to help. however, I'm not aware of what things should be done in order to make an OCR model, and how can others (like me) participate?
thanks

Reply all
Reply to author
Forward
0 new messages