How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

969 views
Skip to first unread message

vaibhav kurhe

unread,
Oct 26, 2015, 6:51:33 AM10/26/15
to tesseract-ocr
Hello everyone!
Can anyone tell me can tesseract be used for regional languages in India?
How to use it on ubuntu to extract Marathi or Hindi text from image?

Bhushan Patil

unread,
Oct 28, 2015, 4:46:20 AM10/28/15
to tesseract-ocr
Hi Vaibhav, As far as I know, Marathi language is not available in Tesseract. For Hindi
  1. download language data
    sudo apt-get install tesseract-ocr-hin
  2. tesseract path/to/image stdout -l hin

ShreeDevi Kumar

unread,
Oct 28, 2015, 8:04:15 AM10/28/15
to tesser...@googlegroups.com

There is marathi traineddata. However that is not trained with cube engine and hence may not be as accurate.

http://packages.ubuntu.com/wily/tesseract-ocr-mar

You can test with both hin and mar and report your experience.

Thanks!
- sent from my phone. excuse the brevity.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/faf89b79-c849-46bb-bff8-8ae9159e3fcc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,
Oct 28, 2015, 8:07:14 AM10/28/15
to tesser...@googlegroups.com

For indian languages also check out OCR feature in google drive/docs.

- sent from my phone. excuse the brevity.

Reply all
Reply to author
Forward
0 new messages