Performance coding in Clojure (and other functional languages)

20 views
Skip to first unread message

levand

unread,
Aug 14, 2008, 5:50:49 PM8/14/08
to Clojure
I am about to start a new pet software project, and I am trying to
make the decision on whether to use Clojure or plain old Java.

First, a bit of background, and why I am making this choice. I am a
fairly experienced professional Java (and formerly C++) programmer,
but have recently become dissatisfied with a lot of things about Java
(thanks a lot, Steve Yegge), and have been doing some research about
other programming paradigms. I've read the Little Schemer & Seasoned
Schemer, learned to hack together a few algorithms in PLT Scheme. It
is beautiful. However, at this stage, it is all very academic to me,
and I am anxious to try out functional programming on a real,
practical project intended for everyday users.

At the same time, I have taken an interest in some other new (to me)
topics in Computer Science, and have developed an interest in, for
example, concurrency models beyond the simple multi-threaded locking
approach usually used in Java.

So, Clojure seems perfect for me. But I do have some reservations,
particularly about performance - not so much the fact that Clojure can
run efficiently, but whether I will be able to make it do so given my
lack of experience in functional programming. My project does some
fairly demanding things, in regards to both memory and execution
speed, and while I am well equipped to optimize both in Java I really
am not sure how I would do it in Clojure (without resorting to writing
very, very java-like code).

One particular example is memory management. My program will have at
its core a large (100k nodes or so) tree structure, which will be
constantly undergoing both major and minor changes. In Java, this
would be straightforward - make each node have as small a possible
memory footprint, choose the data structures based on the most common
operations to that type of node, lock only the levels of the tree
likely to be impacted by a particular change, etc.

But I have no real clue how to go about this efficiently in Clojure
(beyond writing java-clone code) For example - immutable data
structures. They sound great - purely functional programming is, I can
easily see, conceptually beautiful and trivially parellelizable. I
want this. But I have no idea about the memory usage it would entail -
none of the books I have read covered this. If I start with a 100k
node tree that is, say, 10 mb in memory, and I perform 10k insert or
delete operations on it, what sort of memory usage am I looking at
now? Will the "old" versions of the tree get garbage collected
happily, or not, and what can I do or not do to ensure this?

That is just one question (and one I would be very grateful to have
answered), but it kind of shows the quandary I'm in in general. I want
to branch out from the comfortable world of plain old Java, but I also
want my project to be a success, and while I'm pretty darn good with
plain old Procedural/Object Oriented coding I am afraid I will make a
lot of rookie mistakes if I use Clojure. Admittedly, that's just part
of learning something new, but still...

And I haven't been able to find any good resources discussing this
kind of thing. All the functional programming resources I've found,
while excellent, tend to focus on the theoretical beauty of recursion
and not how to analyze whether an approach is going to be fast enough.
The Y Combinator is a thing of great beauty, I'll admit... but what is
the overhead of creating all those functions? And not just on a
theoretical level, but practically, as implemented in individual
languages? What features of the language have clever optimizations and
which ones don't? These answers are hard to find.

If anyone has any advice for someone in my position, I would greatly
appreciate it.

Many thanks,
-Luke V.

Chouser

unread,
Aug 14, 2008, 10:38:36 PM8/14/08
to clo...@googlegroups.com
Nobody's going to be able to answer this for you, of course,
but if you're going to branch out from C++ and Java at all,
Clojure's a pretty safe choice. If absolutely necessary, at
any point, for any object or function, you can drop down
into Java without compromising the rest of your existing or
future Clojure code.

So I'd say: go for it! Let me see if I can answer a couple
of your specific questions.

On Thu, Aug 14, 2008 at 5:50 PM, levand <luke.va...@gmail.com> wrote:
>
> The Y Combinator is a thing of great beauty, I'll admit...
> but what is the overhead of creating all those functions?
> And not just on a theoretical level, but practically, as
> implemented in individual languages?

In Clojure, the runtime creation of a closure (not the
compile-time definition of a function) is just the creation
of a single Java object.

> What features of the language have clever optimizations
> and which ones don't? These answers are hard to find.

Clojure is pretty simple -- as far as I know there are very
few special optimizations. It relies on the JVM for that
kind of cleverness.

I guess a couple things that might fall in the category of
"clever optimizations" are the shared structure of Clojure's
persistent collections and the special primitive array
support.

But for these as well as other specific topics, if you don't
find answers in the documentation, just ask! If the lag and
mental overhead of email is too much, try the IRC channel --
it's a great place to find help thinking through how to
approach a problem.

--Chouser

Parth Malwankar

unread,
Aug 15, 2008, 5:06:29 AM8/15/08
to Clojure
Hello Luke,

One thing that might would out well for you is that Clojure
supports lazy evaluation of sequences.

Consider a scenario where we want to do a "big-computation" (1 sec
each)
on records in a list with a billion items. Typically we don't
may not need all the billion items processed (we may need only
a filtered subset for e.g.).

I have a little function (free-mem) defined below.
(defn free-mem [] (.. Runtime (getRuntime) (freeMemory)))

Here is a simple test at REPL:

user=> (free-mem)
2789176
user=> (defn big-computation [x] (. Thread sleep 1000) (* 10 x))
#'user/big-computation
user=> (time (big-computation 1))
"Elapsed time: 1001.760391 msecs"
10

Ok. So we have big-computation that runs for 1 sec.

user=> (time (def nums (range 1000000000)))
"Elapsed time: 0.166994 msecs"
#'user/nums

Note that it takes me only 0.17 ms to create a list of 1billion
numbers. This is because the list is not really created. I just
have the promise of a list.

Now we take item x from 10000 to 10005 and apply big-computation
to it.

user=> (time (def v (apply vector
(map big-computation
(take 5
(filter (fn [x]
(and (> x 10000) (< x
10010)))
nums))))))
"Elapsed time: 5089.247035 msecs"
#'user/v

Here we create an anonymous function fn that gives us a list
of x from 10000 to 10010. Note that these are 10 items, not 5.
And yet, our time was just 5 seconds (and not 10 seconds for
10 items). This is because we just took 5 items "(take 5 ...)"
from this filtered list of 10 so the remaining computation for
5 items was never done.

Now if we access v it takes negelegible time.
user=> (time (seq v))
"Elapsed time: 0.047423 msecs"
(100010 100020 100030 100040 100050)

Note that the amount of free memory is more or less the same.

user=> (free-mem)
2732024
user=>

Another point to note that a lazy sequence does not mean that
the computation is done every time, so once the computation is
done, it gets cached.

Try the following:
user=> (time (def comps (map big-computation nums)))
"Elapsed time: 0.113774 msecs"
#'user/comps
user=> (defn t5 [] (take 5 comps))
#'user/t5
user=> (t5) ; this will take 5 seconds
(0 10 20 30 40)
user=> (t5) ; this will take 0 seconds as the seq
(0 10 20 30 40) ; will not be evaluated again.

Basically, IMHO lazy data structures can offer significant
advantage assuming that your program is designed to leverage
that. This is probably a paradigm shift from just "doing
the computation" in languages like C and Java vs
"giving a promise of a computation".

Have a look at:
http://en.wikibooks.org/wiki/Clojure_Programming#Lazy_Evaluation_and_Infinite_Data_Source
also see take and variant along with filter, map and for on
Clojure page.
Sequences: http://clojure.org/sequences
API : http://clojure.org/api

My recommendation would be to give it a shot. It might be
worth it. If you feel otherwise you always have the option
of dropping into Java as Clojure has good Java interop.

Hope this helps.
Parth
Reply all
Reply to author
Forward
0 new messages