Amharic language ocr

408 views
Skip to first unread message

Bel_Ethio

unread,
Dec 20, 2013, 3:31:02 AM12/20/13
to tesser...@googlegroups.com
Hello there 
I want to develop amharic language OCR,i need some help from the group to help me through the basics!

Merhawi Fissehaye

unread,
Feb 19, 2014, 5:47:56 AM2/19/14
to tesser...@googlegroups.com
There are no comments from the experts yet, so it wouldn't hurt to share ideas among our selves. I am working on the same topic for my mini-project and I came across the Tesseract-OCR engine. Tesseract is an open-source OCR engine began as a PhD research project in HP Labs, Bristol. It's accuracy outsmarted the other similar commercial engines. In 2005, it was released for open-source. It's trainable and I have found a training data for amharic on the internet contributed by Mr. Sirak to google code: http://code.google.com/p/tesseract-ocr/issues/detail?id=859. I don't think they have made it available for download from the official code repository. You can download it from the link I gave you; but I suggest you read the training procedure first at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. It will give you some points as to how the training-data is organized. I am using a python-binding for the tesseract engine called pytesser. I have attached it here.

I hope this helps
pytesser_v0.0.1.zip

Sree

unread,
Feb 20, 2014, 1:10:24 PM2/20/14
to tesser...@googlegroups.com
Hi Merhawi,

How to  use pytesser along with tesseract ocr engine. What are the tools to be downloaded and what has to be done to make it work. 

Nick White

unread,
Feb 21, 2014, 4:37:08 AM2/21/14
to tesser...@googlegroups.com
Can you all move this discussion to the tesseract-ocr list (rather
than the tesseract-dev list) please? tesseract-dev is only for the
discussion of development of Tesseract itself. You'll find more
people able and willing to help on tesseract-ocr.

Thanks,

Nick
Reply all
Reply to author
Forward
0 new messages