Telugu Spellcheck/Spelling corrector

10 views
Skip to first unread message

Hariharan Ramamurthy

unread,
Sep 11, 2013, 10:24:21 AM9/11/13
to telugu-c...@googlegroups.com

Today while writing an e-mail I noticed that the Google spell check program is really bad.  I'm surprised that Google will make such wonderful spelling corrections and suggestions in the search box fails so miserably when it is correcting text written in the Gmail program.

On this topic in the past I had researched about creating a spellchecker or spelling corrector(what do we call it in Telugu?).

How difficult it is going to be?

this is from an article about Python code for a simple spell checker 

Now let's look at the problem of enumerating the possible corrections c of a given word w. It is common to talk of the edit distance between two words: the number of edits it would take to turn one into the other. An edit can be a deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter). Here's a function that returns a set of all words c that are one edit away from w:

def edits1(word):

   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]

   deletes    = [a + b[1:] for a, b in splits if b]

   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]

   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]

   inserts    = [a + c + b     for a, b in splits for c in alphabet]

   return set(deletes + transposes + replaces + inserts)

This can be a big set. For a word of length n, there will be n deletions, n-1 transpositions, 26n alterations, and 26(n+1) insertions, for a total of 54n+25 (of which a few are typically duplicates). For example, len(edits1('something')) -- that is, the number of elements in the result of edits1('something') -- is 494

we have to think of Telugu language in computational linguistic terms.

If instead of simply stating that the permutations and combinations of various letters can be infinite we need to come up with some statistics about language usage and to come up with the combination of the grapheme.

My guess is this number 494 will be around 6000.

 Dr.Hariharan Ramamurthy .M.D.
Doctor Iyer's Wellness Center

Functional Holistic Integrative Medical Care
Affordable and Quality Healthcare

Treat the Person not the Part or the Chart


CONFIDENTIALITY NOTICE:  This email transmission (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510 et seq., and any information contained in this message is legally privileged, confidential and intended only for the individual or entity named herein.  If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copy of this message is strictly prohibited.  If you have received this message in error, please immediately notify us by telephone and purge all copies of this message from your system.  Thank you.  This communication does not constitute an intention by the sender to conduct a transaction or make any agreement or contract by electronic means.

Although this e-mail and all attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus-free and Republic Title of Texas, Inc. bears no responsibility for any loss or damage arising in any way from its use.


Arjuna Rao Chavala

unread,
Sep 30, 2013, 1:47:23 AM9/30/13
to telugu-c...@googlegroups.com



2013/9/11 Hariharan Ramamurthy <harid...@gmail.com>

t Today while writing an e-mail I noticed that the Google spell check program is really bad.  I'm surprised that Google will make such wonderful spelling corrections and suggestions in the search box fails so miserably when it is correcting text written in the Gmail program. On this topic in the past I had researched about creating a spellchecker or spelling corrector(what do we call it in Telugu?). How difficult it is going to be?

ఫైర్ ఫాక్స్ తెలుగు  స్పెల్ చెకర్ వాడారా? అంత నాణ్యమైనది కాకపోయిన  స్వేచ్ఛగా అందుబాటులో నున్నది. హైద్రాబాదు కేంద్రవిశ్వవిద్యాలయంలోని ఉమామహేశ్వరరావు గారి జట్టు ఆధునిక స్పెల్ చెకర్ నిర్మించింది. అది అందరికి అందుబాటులో ఇంకారాలేదు.  వారి పరిశోధనా పత్రము  (గూగుల్ శోధన ఫలితం) మీకు ఉపయోగపడవచ్చు.

శుభాకాంక్షలు
అర్జున


Reply all
Reply to author
Forward
0 new messages