Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Spelling Error analysis
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Thomas Raffill  
View profile  
 More options Mar 19 2003, 9:57 pm
Newsgroups: comp.ai, comp.ai.nat-lang
From: r...@cse.ucsc.edu (Thomas Raffill)
Date: 20 Mar 2003 13:52:46 +1100
Local: Wed, Mar 19 2003 9:52 pm
Subject: Re: Spelling Error analysis

>I would like to find a pattern once then tot up the occurrences of the
>pattern, I can do this for infliction and derived errors but am having
>problems when it comes to errors in a root word, without a rule I can
>say what the difference between the dictionary form and the error form
>but some rules require information about surrounding graphemes and
>here I am stumped.

I have worked on this kind of issue before. Here are some rough
overall categories of errors. If anyone knows more categories,
please follow up and post them.

1. Typographical: missing letter, extra letter, transposition of
letters, or substitution of a letter for another (for example, the
adjacent letter on a typewriter). There is a lot of published
literature on these kinds of "string edits." Given a pair of strings,
the minimum number of edits needed to transform from one to the other
is called the "edit distance." Most of the automatic spelling correction
programs today can handle misspelled words that are at an edit distance
of 1 from a dictionary word and will return a list of all dictionary words
within 1 string edit.

2. Phonetic: substitution of a particular spelling for another
phonetically related spelling. An example of this would be to
misspell the word "rough" as "ruf." This kind of error is applicable
to languages like English where spelling is not very tightly phonetic.
To handle this kind of error, you can create tables of correspondences
between letter sets and phonetic units. You can transform everything
into phonetic units, then use the edit distance techniques on them.
There is a small body of literature related to this sort of thing.

3. Context-sensitive: substitution of an inappropriate dictionary word
correctly spelled for the appropriate word. For example, substitution
of "passed" for "past." Groups of words that are often inappropriately
substituted for each other in this way are called "confusion sets" in
the literature. This is a more difficult task and you can use all of
the techniques of natural language processing to tackle it. Most
spelling correction programs today are not capable of handling
this kind of error.

Hope this helps.

Thomas Raffill

[ comp.ai is moderated.  To submit, just post and be patient, or if ]
[ that fails mail your article to <comp...@moderators.isc.org>, and ]
[ ask your news administrator to fix the problems with your system. ]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.