Spelling corrector example

24 views
Skip to first unread message

Rich Hickey

unread,
Mar 1, 2008, 10:42:41 PM3/1/08
to Clojure
I've added a new example to the Wiki - A Clojure version of Norvig's
spelling corrector:

http://en.wikibooks.org/wiki/Clojure_Programming#Examples

If you link through to the original site, he has a Python version and
links to versions in many other languages, making for a nice
comparison.

Along the way, Clojure got subs(tring), slurp (a file), max-key and
min-key...

Rich

Stephen C. Gilardi

unread,
Mar 3, 2008, 2:19:34 AM3/3/08
to clo...@googlegroups.com
I tried running this on clojure SVN715.  I get a Java out of memory error during training.  I tried it with the default Java 5 on Leopard and also with the developer preview 9 of Java 6.  The test machine is a quad core Mac Pro with 6 GB of memory running Leopard 10.5.2.  The Python version runs fine.  I tried to train on a file containing the first 10% of big.txt and that worked.

Any ideas?

--Steve

--------------------------------

Clojure
user=> (defn words [text] (re-seq #"[a-z]+" (. text (toLowerCase))))
#<Var: user/words>
user=> (defn train [features]
  (reduce (fn [model f] (assoc model f (inc (get model f 1)))) 
          {} features))
#<Var: user/train>
user=> (def *nwords* (train (words (slurp "/tmp/big.txt"))))
java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentHashMap$FullNode.assoc(PersistentHashMap.java:244)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:105)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
at clojure.lang.RT.assoc(RT.java:467)
at clojure.fn__45.invoke(boot.clj:113)
at user.fn__960$fn__961.invoke(Unknown Source)
at clojure.fn__120.invoke(boot.clj:399)
at user.fn__960.invoke(Unknown Source)
at clojure.lang.AFn.applyToHelper(AFn.java:173)
at clojure.lang.AFn.applyTo(AFn.java:164)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:2249)
at clojure.lang.Compiler$DefExpr.eval(Compiler.java:257)
at clojure.lang.Compiler.eval(Compiler.java:3208)
at clojure.lang.Repl.main(Repl.java:64)
user=> 

Rich Hickey

unread,
Mar 3, 2008, 8:30:03 AM3/3/08
to Clojure


On Mar 3, 2:19 am, "Stephen C. Gilardi" <scgila...@gmail.com> wrote:
> I tried running this on clojure SVN715. I get a Java out of memory
> error during training. I tried it with the default Java 5 on Leopard
> and also with the developer preview 9 of Java 6. The test machine is
> a quad core Mac Pro with 6 GB of memory running Leopard 10.5.2. The
> Python version runs fine. I tried to train on a file containing the
> first 10% of big.txt and that worked.
>
> Any ideas?
>

It's still the JVM - did you try increasing the heap size, or using -
server?

I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
recall the JVM args) and MacPro/Tiger (-server, and no heap
specification).

Rich

Stephen C. Gilardi

unread,
Mar 3, 2008, 9:55:02 AM3/3/08
to clo...@googlegroups.com
> It's still the JVM - did you try increasing the heap size, or using -
> server?
>
> I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
> recall the JVM args) and MacPro/Tiger (-server, and no heap
> specification).

Thanks, both those fixes worked nicely.

$ cat /usr/local/bin/clojure
#!/bin/bash

exec java -Xms32m -Xmx128m -jar /Local/Projects/clojure/clojure.jar

$

It's a very cool demo. I especially like how fast "slurp" is even
though at the lisp level it's recurring on every character in the file.

--Steve

Rich Hickey

unread,
Mar 3, 2008, 10:18:55 AM3/3/08
to Clojure


On Mar 3, 9:55 am, "Stephen C. Gilardi" <scgila...@gmail.com> wrote:
> > It's still the JVM - did you try increasing the heap size, or using -
> > server?
>
> > I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
> > recall the JVM args) and MacPro/Tiger (-server, and no heap
> > specification).
>
> Thanks, both those fixes worked nicely.
>

Great! Want to add a note to the Wiki?

> It's a very cool demo. I especially like how fast "slurp" is even
> though at the lisp level it's recurring on every character in the file.
>

Surprising, right? It's probably worth providing the fastest slurp
possible at some point, but definitely nice to see that day-to-day
performance of ordinary Clojure code is not too shabby.

Rich
Reply all
Reply to author
Forward
0 new messages