Adding the Ge'ez, Amharic, Tigre and Tigrinya Languages

44 views
Skip to first unread message

Daniel Worku

unread,
Aug 3, 2013, 10:14:02 PM8/3/13
to tesser...@googlegroups.com
Hello Dev team,

I'm a newcomer to tesseract.

I am developing trained data files on Ge'ez script and some of it's child writing systems. Amharic and Tigrinya are the national languages of Ethiopia and Eritrea, respectively. I personally implement the files in my proprietary work but the language data will be release under GPLv3 for open-source use by others.

There is one file for Amharic floating around here: http://code.google.com/p/tesseract-ocr/issues/detail?id=859
However, this implementation only uses a limited number of fonts and does not include punctuation or char ambigs.

I need your advice -- jTessBoxEditor is lagging on my Mac when I open multipage tifs. What is the fastest box editor that supports multipage tifs and features deletion, merging and insertion of boxes?

Your advice will be invaluable in helping to expand Tesseract's multilingual support.

Thanks,
Reply all
Reply to author
Forward
0 new messages