How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

vaibhav kurhe

unread,

Oct 26, 2015, 6:51:33 AM10/26/15

to tesseract-ocr

Hello everyone!
Can anyone tell me can tesseract be used for regional languages in India?
How to use it on ubuntu to extract Marathi or Hindi text from image?

Bhushan Patil

unread,

Oct 28, 2015, 4:46:20 AM10/28/15

to tesseract-ocr

Hi Vaibhav, As far as I know, Marathi language is not available in Tesseract. For Hindi

download language data
sudo apt-get install tesseract-ocr-hin
tesseract path/to/image stdout -l hin

ShreeDevi Kumar

unread,

Oct 28, 2015, 8:04:15 AM10/28/15

to tesser...@googlegroups.com

There is marathi traineddata. However that is not trained with cube engine and hence may not be as accurate.

http://packages.ubuntu.com/wily/tesseract-ocr-mar

You can test with both hin and mar and report your experience.

Thanks!
- sent from my phone. excuse the brevity.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/faf89b79-c849-46bb-bff8-8ae9159e3fcc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,

Oct 28, 2015, 8:07:14 AM10/28/15

to tesser...@googlegroups.com

For indian languages also check out OCR feature in google drive/docs.

- sent from my phone. excuse the brevity.

Reply all

Reply to author

Forward