Re: sanskrit ocr

174 views
Skip to first unread message

विश्वासो वासुकेयः (Vishvas Vasuki)

unread,
Apr 10, 2013, 1:09:33 PM4/10/13
to Shree Devi Kumar, Sanskrit Team member, Sai Susarla, questions sanskrit, Ajit Krishnan, Sunder Hattangadi, sanskrit-p...@googlegroups.com
+ sanskrit-programmers list

On Wed, Apr 10, 2013 at 9:07 AM, Shree Devi Kumar <shree...@gmail.com> wrote:
Update:

I have installed tesseract-ocr 3.02 on my PC and have also started training it for sanskrit (and hindi) usiang sanskrit2003 font. Recognition seems to be font-specific and will require training for additional fonts for it to work well on scanned pages.

Some questions:

Which Unicode devanagari font or fonts do you think most resemble the typeface used in old scanned sanskrit/hindi/marathi books? If I train for that font, it will improve accuracy of the scanned pages.

Are there any sample pages/small booklet in .tif file format from DLI or other sites that I can use to test? Any volunteers for proofreading the output?

The software also uses a wordlist (dictionary) that can be used to improve recognition. Is it possible to get a list of more common words or maybe just nouns and adjectives and verb roots from the sanskrit dictionaries? The list should be in unicode devanagari - one word per line.

Regards,
Shree



For 

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Fri, Mar 15, 2013 at 2:08 PM, Shree Devi Kumar <shree...@gmail.com> wrote:
I tried Tesseract OCR with VietOCR frontend.

Please see the following link for samples used and their output. I have tried to raise an issue, will see if they respond.

http://code.google.com/p/tesseract-ocr/issues/detail?id=871

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Fri, Mar 15, 2013 at 3:29 AM, विश्वासो वासुकेयः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:
https://code.google.com/p/tesseract-ocr/ claims "
  • New Languages, Arabic, Hindi, Thai.
"

Never used it though. I will be interested to know how it goes in case anyone uses it.


On Thu, Mar 14, 2013 at 2:41 PM, Sanskrit Team member <sans...@cheerful.com> wrote:
I had made some link updates in FAQ.  There is a reliable OCR from Hellwig but at relatively hefty cost.  Search for OCR in FAQ.

Sai's friends are supposed to be working on it but no word about it from him or anyone for long time.  I guess it is staying in their wishlist.
I have CCd to Sai, let us see what he has found so far.  I am CCing to Vishvas Vasuki and Arun who are also somewhat tracking its progress.

Nandu
 

 

----- Original Message -----

From: Shree Devi Kumar

Sent: 03/14/13 10:27 AM

To: Sanskrit Team_member

Subject: sanskrit ocr

 
Ram ram, Nandu,
 
Any updates on any open source sanskrit OCR?
 
I saw a link for ocrlib at https://code.google.com/p/ocrlib/
but last update is from 2010 and it seems limited to mac os.
 
Tesseract seems to have made a start for Hindi - I installed hindi files with vietocr.net - just trying out right now.
 
Do you have any feedback regarding these?
 
Thanks,
Shree

 




Learn Sanskrit! Love Sanskrit!! Live Sanskrit!!!
http://www.sanskritdocuments.org



--
--
Vishvas /विश्वासः






--
--
Vishvas /विश्वासः

vishnu dutt

unread,
Jul 9, 2013, 4:29:48 AM7/9/13
to sanskrit-p...@googlegroups.com, Shree Devi Kumar, Sanskrit Team member
Hello विश्वासो वासुकेय ,

Actually I don't have knowledge of sanskirt. But I am a web developer, How can i help you. I program into PHP and little bit java.

विश्वासो वासुकेयः (Vishvas Vasuki)

unread,
Jul 10, 2013, 2:40:25 AM7/10/13
to sanskrit-p...@googlegroups.com, Shree Devi Kumar, Sanskrit Team member
Thanks for reaching out Vishnu. May you soon learn the simple and beuatiful language of the ancients.

I don't have any Java/ PhP project in mind right now - will ping you if I do (perhaps shree devi has suggestions.).

In case you are open to learning about dictionary formats (stardict, http://en.wikipedia.org/wiki/Wordnet )  and mobile dictionary programs (esp goldendict), I had a suggestion. Let me know if you are interested.


--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Mārcis Gasūns

unread,
Aug 22, 2013, 11:38:33 AM8/22/13
to sanskrit-p...@googlegroups.com, Shree Devi Kumar, Sanskrit Team member, Sai Susarla, questions sanskrit, Ajit Krishnan, Sunder Hattangadi
Reply all
Reply to author
Forward
0 new messages