First, thank you Rich & Co for doing Clojure.
We've been using Clojure for a real-world project, and I thought I'd
share some observations.
The project does some large-scale (but not really huge) data
analysis. Example: ~10M records are processed and transformed,
various computations occur, and ~100K records are spit out. Lots of
statistics. One type of run takes 12 hours on an 8GB, 4-core Linux
box. A few thousand lines of our Clojure code in total.
My background: I'm an old (?) Lisp hacker. I've used Symbolics, ACL,
CMUCL, schemes (including JScheme and SISC). Also Java since its
early days. And my share of C, C++, SQL, Perl, etc.
So here are some observations. None are news flashes.
1. Clojure works. We were obviously cautious about using a relatively
new system for real work. We tried Clojure, and then we tried some
more. Everything* just worked. We took a snapshot at 1.1.0, and we
didn't update it. [* The JVM does SEGV on us occasionally. Internal
error, which we have not diagnosed. Probably a GC issue. Java
1.6.0_14 with HotSpot 14.0-b16 64-bit server. But this problem is
another topic.]
2. Clojure-the-language is a good fit for real work. The language has
a distinct DWIM feel, which I like. Destructuring everywhere, auto-
gensym-ing, various syntactic sugar, and the advanced features (e.g.,
concurrency primitives with STM, tries) are all convenient, effective,
and useful. Clojure is practical.
3. We dropped into Java once to implement an LRU cache to back custom
memoization. We presumably could have implemented the cache in
Clojure, but Java seemed simpler. Probably mostly due to our
inexperience with Clojure. Clojure's Java interop is of course
excellent -- as claimed and widely reported.
4. Reminder: concurrent work on distinct datastructures uses memory
concurrently! Obvious and obviously not specific to Clojure. But
Clojure makes it so easy to do things in parallel that it's easy to
forget the implications. We found ourselves having to do some
judicious doall's and such to avoid running out of memory in a
subsequent -- so to speak -- stage. (Imagine a parallel aggregation
of a lot of data that results in a small object, which is then
subsequently used with different big data. Might not be able to work
on the former and latter at the same time.) So watch out when you
work on distinct data in stages in parallel on that 100-core Tilera
board. Aside: We might like the option of selective non-laziness, but
we're not sure. That's yet another topic.
5. The functionality of the docs hasn't kept up with Clojure. We
often resorted to text searches of the various sources. Need links
and see-also's. Clojure has grown/matured so much that it needs a doc
system of some sort.
6. Debugging facilities also have not kept up with the state of
Clojure. We use Slime and JDB and some contributed tools, but the
result isn't that convenient. I miss the good Lisp debuggers, but I'm
old-fashioned I guess. We need to re-survey what's available for
tracing, logging, restarts, etc. A state-of-the-art tutorial would be
great.
7. We use Incanter (
http://incanter.org/), which worked well for some
of the statistics we need. BTW, for different work, we still use
Mathematica, but we haven't yet needed the slick-looking Clojuratica
(
http://clojuratica.weebly.com/). We probably will, and I'm eager to
use it. For us, Clojure is becoming the application-level, all-
purpose glue. We can't throw away Mathematica or Matlab or R, and we
don't need to.
That's it for now. We'll look for ways we can contribute.
Thanks again for the excellent system.
--Jamie