spelling suggest

AurovilleRadio

unread,

Jul 26, 2011, 2:12:58 AM7/26/11

to

I am looking for a spelling suggest library in tcl. I found this link
http://norvig.com/spell-correct.html but sadly Tcl is missing.
I prefer not to use google suggest as I want the application not to
depend on internet connection.

I created sqlite words database and the basic spell suggester is
working (look for a word with same letters without the vowels) but I
want to make it smarter but of course with reasonable performance.

any leads?

Alexandre Ferrieux

unread,

Jul 26, 2011, 3:27:06 AM7/26/11

to

On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
> I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbut sadly Tcl is missing.

> I prefer not to use google suggest as I want the application not to
> depend on internet connection.
>
> I created sqlite words database and the basic spell suggester is
> working (look for a word with same letters without the vowels) but I
> want to make it smarter but of course with reasonable performance.
>
> any leads?

You're proposing two leads which seem orthogonal:

(a) port Norvig's code to Tcl
(b) write something from scratch, with a completely different
mechanism

Now if it's (a), it is easy, just ask ;-)

-Alex

AurovilleRadio

unread,

Jul 26, 2011, 4:54:40 AM7/26/11

to

On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadly Tcl is missing.

> > I prefer not to use google suggest as I want the application not to
> > depend on internet connection.
>
> > I created sqlite words database and the basic spell suggester is
> > working (look for a word with same letters without the vowels) but I
> > want to make it smarter but of course with reasonable performance.
>
> > any leads?
>
> You're proposing two leads which seem orthogonal:
>
> (a) port Norvig's code to Tcl
> (b) write something from scratch, with a completely different
> mechanism
>
> Now if it's (a), it is easy, just ask ;-)
>
> -Alex

Of course it would be great if some noble soul would port it but I
would not dare to ask for it. I was also hoping for:
(c) there is already somewhere a code that does this that for some
reason hides from google's praying eye.

Uwe Klein

unread,

Jul 26, 2011, 5:10:02 AM7/26/11

to

AurovilleRadio wrote:
> On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
>>On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>>
>>
>>>I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadly Tcl is missing.
>>>I prefer not to use google suggest as I want the application not to
>>>depend on internet connection.

> Of course it would be great if some noble soul would port it but I

> would not dare to ask for it. I was also hoping for:
> (c) there is already somewhere a code that does this that for some
> reason hides from google's praying eye.

TclPython:
creates slave interpreter that understands python:
http://wiki.tcl.tk/5630
http://jfontain.free.fr/tclpython.htm

uwe

Cyan

unread,

Jul 26, 2011, 5:47:29 AM7/26/11

to

On Jul 26, 10:54 am, AurovilleRadio <avradion...@gmail.com> wrote:
> On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
>
>
>
>
>
>
>
>
> > On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadlyTcl is missing.

> > > I prefer not to use google suggest as I want the application not to
> > > depend on internet connection.
>
> > > I created sqlite words database and the basic spell suggester is
> > > working (look for a word with same letters without the vowels) but I
> > > want to make it smarter but of course with reasonable performance.
>
> > > any leads?
>
> > You're proposing two leads which seem orthogonal:
>
> > (a) port Norvig's code to Tcl
> > (b) write something from scratch, with a completely different
> > mechanism
>
> > Now if it's (a), it is easy, just ask ;-)
>
> > -Alex
>
> Of course it would be great if some noble soul would port it but I
> would not dare to ask for it. I was also hoping for:
> (c) there is already somewhere a code that does this that for some
> reason hides from google's praying eye.

Sort of - I built a minimal wrapper on libpspell years ago, don't
remember if I ever released it. If the API:

pspell::check_word /word/
returns true if /word/ appears to be correct

pspell::suggest_words /word/
returns a list of suggested corrections for /word/

is what you're looking for then I can put it up somewhere. I also
vaguely remember building a text widget based megawidget that did the
usual red-underlining of errors with a right-click selection popup.
Would have been based on Itk though, probably need quite a lot of tlc
to get rid of the years of code rot.

Cyan

Alexandre Ferrieux

unread,

Jul 26, 2011, 7:06:44 AM7/26/11

to

On 26 juil, 10:54, AurovilleRadio <avradion...@gmail.com> wrote:
> On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
>
>
>
>
>
>
>
>
> > On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadlyTcl is missing.

> > > I prefer not to use google suggest as I want the application not to
> > > depend on internet connection.
>
> > > I created sqlite words database and the basic spell suggester is
> > > working (look for a word with same letters without the vowels) but I
> > > want to make it smarter but of course with reasonable performance.
>
> > > any leads?
>
> > You're proposing two leads which seem orthogonal:
>
> > (a) port Norvig's code to Tcl
> > (b) write something from scratch, with a completely different
> > mechanism
>
> > Now if it's (a), it is easy, just ask ;-)
>
> > -Alex
>
> Of course it would be great if some noble soul would port it but I
> would not dare to ask for it.

OK, I've done this for you:

http://wiki.tcl.tk/28599

-Alex

Arjen Markus

unread,

Jul 26, 2011, 9:31:59 AM7/26/11

to

On 26 jul, 13:06, Alexandre Ferrieux <alexandre.ferri...@gmail.com>

wrote:
> On 26 juil, 10:54, AurovilleRadio <avradion...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> > wrote:
>
> > > On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > > > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadlyTclis missing.

> > > > I prefer not to use google suggest as I want the application not to
> > > > depend on internet connection.
>
> > > > I created sqlite words database and the basic spell suggester is
> > > > working (look for a word with same letters without the vowels) but I
> > > > want to make it smarter but of course with reasonable performance.
>
> > > > any leads?
>
> > > You're proposing two leads which seem orthogonal:
>
> > > (a) port Norvig's code to Tcl
> > > (b) write something from scratch, with a completely different
> > > mechanism
>
> > > Now if it's (a), it is easy, just ask ;-)
>
> > > -Alex
>
> > Of course it would be great if some noble soul would port it but I
> > would not dare to ask for it.
>
> OK, I've done this for you:
>
> http://wiki.tcl.tk/28599
>
> -Alex

Looking at that code (not studying it in detail) it seems to me that:
- putting the main loop in a proc would speed up things
- using the split-up version of the strings would speed up things as
well
I do not know how much nor if it would have a noticeable effect, but
that
is my first impression.

Regards,

Arjen

Alexandre Ferrieux

unread,

Jul 26, 2011, 12:45:52 PM7/26/11

to

Yes. As I said on that wiki page: feel free to optimize, measure, and
edit the page :)
(and of course report here, for those - like me - who aren't hooked to
wiki updates ...)

-Alex

Donal K. Fellows

unread,

Jul 26, 2011, 3:13:43 PM7/26/11

to

On Jul 26, 5:45 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> (and of course report here, for those - like me - who aren't hooked to
> wiki updates ...)

Hardly anyone is; we just use the wiki's rather nice history
functions. :-)

Donal.

AurovilleRadio

unread,

Jul 27, 2011, 1:35:08 AM7/27/11

to

On Jul 26, 4:06 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>

wrote:
> On 26 juil, 10:54, AurovilleRadio <avradion...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> > wrote:
>
> > > On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > > > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadlyTclis missing.

> > > > I prefer not to use google suggest as I want the application not to
> > > > depend on internet connection.
>
> > > > I created sqlite words database and the basic spell suggester is
> > > > working (look for a word with same letters without the vowels) but I
> > > > want to make it smarter but of course with reasonable performance.
>
> > > > any leads?
>
> > > You're proposing two leads which seem orthogonal:
>
> > > (a) port Norvig's code to Tcl
> > > (b) write something from scratch, with a completely different
> > > mechanism
>
> > > Now if it's (a), it is easy, just ask ;-)
>
> > > -Alex
>
> > Of course it would be great if some noble soul would port it but I
> > would not dare to ask for it.
>
> OK, I've done this for you:
>
> http://wiki.tcl.tk/28599
>
> -Alex

Amazing, thanks a lot for the beautiful brand new wiki page. Playing
with it one can see that the real challenge in the spell checker is
how to build the probability correctly.

AurovilleRadio

unread,

Jul 27, 2011, 1:38:14 AM7/27/11

to

On Jul 26, 2:47 pm, Cyan <cyan.ogil...@gmail.com> wrote:
> On Jul 26, 10:54 am, AurovilleRadio <avradion...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Jul 26, 12:27 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> > wrote:
>
> > > On 26 juil, 08:12, AurovilleRadio <avradion...@gmail.com> wrote:
>

> > > > I am looking for a spelling suggest library in tcl. I found this linkhttp://norvig.com/spell-correct.htmlbutsadlyTclis missing.

It would be nice for all to have option for this. I need it for a
starpack so for me it can be useful only if it can work in such
configuration.

Alexandre Ferrieux

unread,

Jul 27, 2011, 8:32:12 AM7/27/11

to

Yes. Note that in line with the huge amount of literature existing on
statistical language models, the obvious next step if you're not
satisfied with this unigram (single-word probabilities) model, is to
go bigram. To do this, you'll need to:

- populate the "model" array with both single words (as does the
current code) and consecutive pairs from the input corpus, with a
nonalpha separator like ":". For example, "there:is" will likely get a
hefty count.

- refine the search strategy in the following way:

- allow to enter full sentences instead if single words.
collapse nonalpha to space instead of "".
- do the usual distance-2 search for the single word $w (not
pruning to distance 1 even if there are solutions)
- also do it for $prev:$w and $w:$next (where $prev and $next
are the words immediately surrounding $w in the input sentence)
- give preference to candidates with bigram matches (still
with subpreference for distance 1 over distance 2).

A secondary refinement could be: for the last rule, use a soft
decision: compute a composite probability based on both the bigram and
unigram values, so that a high-prob unigram can still take over a rare
bigram.

Disclaimer: this off the top of my head -- I'm no expert in text
language models, though I do have experience in similar methods for
speech recognition.

HTH,

-Alex

tomk

unread,

Jul 27, 2011, 1:27:33 PM7/27/11

to

On Jul 27, 5:32 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>

Many year ago I integrated gnu ispell into an application. As I recall
the code is in C but not all that complicated and could probably be
ported with out to much trouble. The nice thing about going that route
is it has a built in dictionary that is gnu licensed. The dictionary
is root word suffix form IIRC.

tomk

Gerald W. Lester

unread,

Jul 27, 2011, 9:26:20 PM7/27/11

to

On 7/27/11 12:27 PM, tomk wrote:
> ...

>
> Many year ago I integrated gnu ispell into an application. As I recall
> the code is in C but not all that complicated and could probably be
> ported with out to much trouble. The nice thing about going that route

> is it has a built in dictionary that is gnu licensed. ...

And the bad thing about it is that it is gnu licensed and not BSD licensed.

--
+------------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+------------------------------------------------------------------------+