I have written a Python script to analyze Minecraft levels and render a graph. Then I did the same with Clojure. It takes Python 10 seconds to analyze a map, while it takes Clojure over a minute.
After having tried different options without any significant improvement, I am lost as to why there is such a huge difference. I wouldn't mind an extra pair of eyes/brains to look at this.
I blogged about it in more detail here: http://pepijndevos.nl/clojure-versus-python
Clojure version: https://github.com/pepijndevos/Clomian/
Python version: https://github.com/l0b0/mian
Clojure spends most of its time in the freqs function, here are a couple of variations: https://gist.github.com/663096
If you want to run the code yourself, you'll need a Minecraft level and JNBT, which is not on Maven.
JNBT: http://jnbt.sourceforge.net/
The level used in the blogpost: http://dl.dropbox.com/u/10094764/World2.zip
Groeten,
Pepijn de Vos
--
Sent from my iPod Shuffle
http://pepijndevos.nl
Can you check GC activity in the clojure version?
I once ran into an issue where Python was running rings around an
Eiffel version (compiled down to native code - no VM need apply). This
looks similar to what you have, in that I built a large data
structure, and then started groveling over it. Turned out that Eiffel
was doing a mark-and-sweep GC, which was spending all of it's time
marking and sweeping the large static data structure, whereas python
doing a reference count GC didn't. Given that I know nothing about
Java GCs, this is just a WAG.
Come to think of it, how about trying to run the program Jython? That
should have the same GC issues. If it's some similar environmental
problem, that would show up there as well.
<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
There are many different collectors for the JVMs, too numerous to list
here, all tunable.
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
I don't know how to check the GC activity on my project, but I did run
Mian on Jython. It performs much like my initial Clojure version. It
consumes absurd amounts of memory and never finishes.
So I think we can safely say that Java's GC or the way it stores data
is less efficient on this type of problem than Python.
(defn freqs [^bytes blocks]
(loop [idx 0
ret (cycle (repeatedly 128 #(transient {})))]
(if (< idx (alength blocks))
(do
(update! (first ret) (aget blocks idx) (fnil inc 0))
(recur (inc idx) (next ret)))
(map persistent! (take 128 ret)))))
I'm not familiar with incanter, which defines update!, but the update!
call makes me suspicious. Transients are not designed to be banged on
in place. That would explain your losing results.
// ben
- Greg
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
I'm very curios about this situation, please let us know if you manage to write a version that's faster than the python one (as David claims is possible). I would attempt it myself but I've only just recently had the time to dive back into Clojure. :-\
- Greg
Or just use -XX:+PrintGC and maybe -XX:+PrintGCDetails and
-XX:+PrintGCTimeStamps.
I haven't checked what the code is doing, but if you suspect extremely
poor performance due to GC it may be because your application happens
to require some amount of memory that is below but fairly close to the
default maximum heap size. That may easily cause very frequent GC:s
and show up as poor performance. If this is the case, doubling the
heap size should fix it (-Xmx...).
(The JVM does throw OutOfMemoryExceptions when it decides there is
cause too, but it is a difficult heuristic to decide when that is
actually the right thing to do. So it's very possible to be in
situations that are not quite so bad in terms of time spent doing GC
that the JVM throws an exception, yet bad enough to cause very
frequent full GC:s at considerable cost in CPU time.)
--
/ Peter Schuller
> I increased the heap space a lot, but I'm just bordering on the edge
> of my real memory, so it's not helping much.
Did you try pushing the minimum heap space up. I'm usually lazy and set them to the same. I've had serious trouble caused by the way the JVM increases the heap space. Setting min to max (and max big) pretty much took care of that issue.
Cheers,
Bob
----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so
If that was for all of the 30 seconds then yeah, GC is not the issue.
--
/ Peter Schuller
People keep making claims like this in various situations but I don't
tend to hear details. Exactly what problems are you having that would
plausibly apply in this situation?
Not that there is no reason to set ms=mx (there are reasons), but the
need to do so tends to be over-stated in my opinion. But if I'm
missing something I'd like to know about it :)
--
/ Peter Schuller
I understand your scepticism but, even applaud it, but, in my case, it comes from actually trying it and measuring the difference (again in my case you didn't need anything fancy it was huge and highly visible). It happened often enough on different projects that I just do it routinely now. Anyway, if I *ever* see something that might be a GC-like problem I first eliminate heap growth from the picture (and, this is all I was suggesting here). Perhaps a hold over from earlier versions of the JVM but I don't personally care that much with my servers -- I have a machine dedicated to the application, it's got a lot of memory, use it. Might have a different attitude for a desktop app :-)
Cheers,
Bob
>
> --
> / Peter Schuller
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
----
I would put the return value of the assoc! call as one of the values
of the loop.
Otherwise, as far as I understand, you may lose values associated with
the fl transient map.
(Transient maps should not be used any differently than regular maps.
It's just their internals that are different.)
Albert
--
http://albert.rierol.net
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
Cheers,
Leif