Thanks for the comments...
> * It probably makes sense to turn this into a lib with an ns form at
> the top. My experience has been that you need to do that to be able to
> debug, plus the niceness of use/require is convenient.
Will do.
> * Have you tested char-seq and word-seq against really large inputs?
> The apparently recursive calls in there make me nervous: see [1].
Yes... I've loaded entire 500-page novels, with index sizes up to half
a million entries. Also, the char-seq function is a slightly modified
clone of the line-seq function in the Clojure standard lib. I am not
100% sure I understand it completely, but to the best of my knowledge,
it works because lazy-cons is a macro. The nested call to char/word-
seq is not called recursively, rather, the call to it actually creates
a closure that is stored and then lazily called when required by the
seq. So it doesn't add to the stack. I would expect that a count of 20
million would blow the heap, unless you've assigned Java a lot of heap
space.
Takes only 10-15 seconds to build the index, too. I'm quite impressed
with Clojure's speed in that regard. And index lookups are fast,
despite using seqs as keys.
I wasn't aware of the seq-utils library... It does seem as if the
contrib library needs better documentation. I can't find anything
about what's available except what people happen to mention on the
newsgroup, short of reading the source itself.
Thanks for the comments! I appreciate it.
-Luke
On Dec 15, 10:34 am, "Craig Andera" <
cand...@wangdera.com> wrote:
> Inexpert observations:
>
> * It probably makes sense to turn this into a lib with an ns form at
> the top. My experience has been that you need to do that to be able to
> debug, plus the niceness of use/require is convenient.
> * Have you tested char-seq and word-seq against really large inputs?
> The apparently recursive calls in there make me nervous: see [1].
> * Have you looked at clojure.contrib.seq-utils? It seems like
> partition-by might make your word-seq function obsolete.
> * There have been some comments on the clojure list recently around
> the difference between streams and seqs that might be relevant.
>
> [1]
> This function blows the heap when called via (count (down-from
> 20000000)). I would expect it to blow the *stack*, though: perhaps
> count retains the head somehow. Also, I don't think I understand
> exactly how lazy-cons works.
>
> (defn down-from [x]
> (if (> x 0)
> (lazy-cons x (down-from (dec x)))))
>
> On Sun, Dec 14, 2008 at 3:42 PM, Luke VanderHart
>
> <
luke.vanderh...@gmail.com> wrote:
>
> > I have written a Markov Chain generator in Clojure. I'm really pleased
> > with how it turned out, given that it's the first non-trivial program
> > I've written in it. (If you don't know what a Markov Chain is, you can
> > read about it athttp://
en.wikipedia.org/wiki/Markov_chain). It
> > supports chaining based on letters or on words - both yield
> > interesting results. Just run it on any sufficiently large corpus
> > text, and you'll get output that mimics the style. I've run it on
> > Plato, Mark Twain and Jane Austen, and it's really interesting to
> > watch the different styles represented through pure randomness/
> > statistics.
>
> > The first iteration took me 4-5 hours, and then I took about that much
> > time again and re-wrote it to be fully lazy and decreased the size of
> > the code base by 50%, and far more readable and elegant. It performs
> > better, too.
>
> > I love the language - as compared with Java, it is so much fun to
> > write. It takes more effort for me to reason about, but all the code I
> > write is absolutely part of the program logic - I hadn't realized how
> > much boilerplate Java had until I thought about the Java
> > implementation of it.
>
> > I've uploaded it to the newsgroup here:
> >
http://clojure-study-dc.googlegroups.com/web/markov.clj?gsc=qogOMBYAA...