Spelling corrector example

Skip to first unread message

Rich Hickey

Mar 1, 2008, 10:42:41 PM3/1/08
to Clojure
I've added a new example to the Wiki - A Clojure version of Norvig's
spelling corrector:


If you link through to the original site, he has a Python version and
links to versions in many other languages, making for a nice

Along the way, Clojure got subs(tring), slurp (a file), max-key and


Stephen C. Gilardi

Mar 3, 2008, 2:19:34 AM3/3/08
to clo...@googlegroups.com
I tried running this on clojure SVN715.  I get a Java out of memory error during training.  I tried it with the default Java 5 on Leopard and also with the developer preview 9 of Java 6.  The test machine is a quad core Mac Pro with 6 GB of memory running Leopard 10.5.2.  The Python version runs fine.  I tried to train on a file containing the first 10% of big.txt and that worked.

Any ideas?



user=> (defn words [text] (re-seq #"[a-z]+" (. text (toLowerCase))))
#<Var: user/words>
user=> (defn train [features]
  (reduce (fn [model f] (assoc model f (inc (get model f 1)))) 
          {} features))
#<Var: user/train>
user=> (def *nwords* (train (words (slurp "/tmp/big.txt"))))
java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentHashMap$FullNode.assoc(PersistentHashMap.java:244)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:105)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
at clojure.lang.RT.assoc(RT.java:467)
at clojure.fn__45.invoke(boot.clj:113)
at user.fn__960$fn__961.invoke(Unknown Source)
at clojure.fn__120.invoke(boot.clj:399)
at user.fn__960.invoke(Unknown Source)
at clojure.lang.AFn.applyToHelper(AFn.java:173)
at clojure.lang.AFn.applyTo(AFn.java:164)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:2249)
at clojure.lang.Compiler$DefExpr.eval(Compiler.java:257)
at clojure.lang.Compiler.eval(Compiler.java:3208)
at clojure.lang.Repl.main(Repl.java:64)

Rich Hickey

Mar 3, 2008, 8:30:03 AM3/3/08
to Clojure

On Mar 3, 2:19 am, "Stephen C. Gilardi" <scgila...@gmail.com> wrote:
> I tried running this on clojure SVN715. I get a Java out of memory
> error during training. I tried it with the default Java 5 on Leopard
> and also with the developer preview 9 of Java 6. The test machine is
> a quad core Mac Pro with 6 GB of memory running Leopard 10.5.2. The
> Python version runs fine. I tried to train on a file containing the
> first 10% of big.txt and that worked.
> Any ideas?

It's still the JVM - did you try increasing the heap size, or using -

I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
recall the JVM args) and MacPro/Tiger (-server, and no heap


Stephen C. Gilardi

Mar 3, 2008, 9:55:02 AM3/3/08
to clo...@googlegroups.com
> It's still the JVM - did you try increasing the heap size, or using -
> server?
> I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
> recall the JVM args) and MacPro/Tiger (-server, and no heap
> specification).

Thanks, both those fixes worked nicely.

$ cat /usr/local/bin/clojure

exec java -Xms32m -Xmx128m -jar /Local/Projects/clojure/clojure.jar


It's a very cool demo. I especially like how fast "slurp" is even
though at the lisp level it's recurring on every character in the file.


Rich Hickey

Mar 3, 2008, 10:18:55 AM3/3/08
to Clojure

On Mar 3, 9:55 am, "Stephen C. Gilardi" <scgila...@gmail.com> wrote:
> > It's still the JVM - did you try increasing the heap size, or using -
> > server?
> > I trained it on the entire big.txt on both a MacBookPro/Leopard (don't
> > recall the JVM args) and MacPro/Tiger (-server, and no heap
> > specification).
> Thanks, both those fixes worked nicely.

Great! Want to add a note to the Wiki?

> It's a very cool demo. I especially like how fast "slurp" is even
> though at the lisp level it's recurring on every character in the file.

Surprising, right? It's probably worth providing the fastest slurp
possible at some point, but definitely nice to see that day-to-day
performance of ordinary Clojure code is not too shabby.

Reply all
Reply to author
0 new messages