On the performance of PersistentHashMap

50 views
Skip to first unread message

Antony Lee

unread,
Nov 15, 2012, 4:06:04 AM11/15/12
to clojure-py-dev
Right now "python clojure.py -c 'nil'" (i.e., mostly compiling core.clj) takes ~3.5s to run on my laptop; but every once in a while it seems to take much longer (Am I the only one to have this problem or is that the case for most of you?).  I've done some statistics: basically, in ~20% of the cases, it takes close to 8s to run.  The culprit has easily been found: it's PersistentHashMap (memory consumption explodes and ctrl-c profiling always ends up in some method of that class).  An easy guess is that this is due to hash collision (which occurs at random, as hashes are themselves not conserved between runs).

Given that most of the hashmaps and vectors in use are pretty small anyways, I tried replacing PersistentHashMap and PersistentVector by two "naïve" class implementations based on dict and tuple, that simply copy themselves whenever needed.  As can be expected, swapping in the new implementation of PersistentHashMap as a simple map entirely suppressed the hash collision problem (now startup always takes 3.5s), while the new implementation of PersistentVector has no effect on the startup time by itself (but was needed to implement PersistentHashMap -- seq'ing being close to a simple call of tuple(some_dict)).

The code is available at https://github.com/anntzer/clojure-py/tree/simple-data-structures (I can create a pull request too) and somewhat ugly right now (because of interface mismatch).

I guess the right way to do this is probably to have a promote-on-size-cutoff policy for these, i.e. switch to the full implementation of PersistentHashMap when the map size grows beyond some fixed limit.  (Or to fix the implementation of PersistentHashMap for those motivated)  But probably not an urgent goal right now...

Antony
Reply all
Reply to author
Forward
0 new messages