Update:I have installed tesseract-ocr 3.02 on my PC and have also started training it for sanskrit (and hindi) usiang sanskrit2003 font. Recognition seems to be font-specific and will require training for additional fonts for it to work well on scanned pages.Some questions:Which Unicode devanagari font or fonts do you think most resemble the typeface used in old scanned sanskrit/hindi/marathi books? If I train for that font, it will improve accuracy of the scanned pages.Are there any sample pages/small booklet in .tif file format from DLI or other sites that I can use to test? Any volunteers for proofreading the output?The software also uses a wordlist (dictionary) that can be used to improve recognition. Is it possible to get a list of more common words or maybe just nouns and adjectives and verb roots from the sanskrit dictionaries? The list should be in unicode devanagari - one word per line.Regards,ShreeForShree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.comOn Fri, Mar 15, 2013 at 2:08 PM, Shree Devi Kumar <shree...@gmail.com> wrote:
I tried Tesseract OCR with VietOCR frontend.
Please see the following link for samples used and their output. I have tried to raise an issue, will see if they respond.
http://code.google.com/p/tesseract-ocr/issues/detail?id=871
Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.comOn Fri, Mar 15, 2013 at 3:29 AM, विश्वासो वासुकेयः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:
https://code.google.com/p/tesseract-ocr/ claims ""
- New Languages, Arabic, Hindi, Thai.
Never used it though. I will be interested to know how it goes in case anyone uses it.--
On Thu, Mar 14, 2013 at 2:41 PM, Sanskrit Team member <sans...@cheerful.com> wrote:
I had made some link updates in FAQ. There is a reliable OCR from Hellwig but at relatively hefty cost. Search for OCR in FAQ.
Sai's friends are supposed to be working on it but no word about it from him or anyone for long time. I guess it is staying in their wishlist.
I have CCd to Sai, let us see what he has found so far. I am CCing to Vishvas Vasuki and Arun who are also somewhat tracking its progress.
Nandu
----- Original Message -----
From: Shree Devi Kumar
Sent: 03/14/13 10:27 AM
To: Sanskrit Team_member
Subject: sanskrit ocr
Ram ram, Nandu,Any updates on any open source sanskrit OCR?I saw a link for ocrlib at https://code.google.com/p/ocrlib/but last update is from 2010 and it seems limited to mac os.Tesseract seems to have made a start for Hindi - I installed hindi files with vietocr.net - just trying out right now.Do you have any feedback regarding these?Thanks,Shree
Learn Sanskrit! Love Sanskrit!! Live Sanskrit!!!
http://www.sanskritdocuments.org
--
Vishvas /विश्वासः
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.