Contrib proposal: Clojure ports of sorted collections

133 views
Skip to first unread message

Michał Marczyk

unread,
Jan 19, 2013, 4:03:43 AM1/19/13
to cloju...@googlegroups.com
Hi,

I've been toying with the ClojureScript ports of the currently
available persistent data structures to see what it would take to port
them back to Clojure. Clearly it makes the most sense to start with
the sorted collections, since nothing else in Clojure depends on their
presence, so they are the natural candidate for the first data
structure to be replaced with a pure-Clojure implementation in an
actual release some day.

Here's a version of Clojure's sorted map and sorted set, using code
mostly pulled from the ClojureScript ports, with various adjustments /
bug fixes (corresponding patches for ClojureScript forthcoming):

https://github.com/michalmarczyk/sorted.clj

The aforementioned adjustments include using clojure.lang.MapEntry
instances where map entries are needed (the nodes of the tree do not
implement IMapEntry and are never handed to clients who request map
entries) and making the order in which reduce-kv sees the key/value
pairs consistent with the ordering of the collection.

As for performance -- that definitely needs to be measured more
carefully, but for now see some basic Criterium benchmarks below
(using Criterium 0.3.1).

Is there any interest in this becoming a part of contrib? It would be
great to test and tune this code "under the CA", as it were, so that
it can be merged back into the core library whenever it's mature
enough.

Cheers,
Michał


;;; (require '[sorted.core :as s] '[criterium.core :as c])

user> (let [m1 (apply sorted-map (interleave (range 1024) (range 1024)))
m2 (apply s/sorted-map (interleave (range 1024) (range 1024)))]
(c/quick-bench (get m1 128))
(c/quick-bench (get m2 128)))
WARNING: Final GC required 33.505380650613795 % of runtime
Evaluation count : 2943666 in 6 samples of 490611 calls.
Execution time mean : 207.229949 ns
Execution time std-deviation : 4.776218 ns
Execution time lower quantile : 201.418831 ns ( 2.5%)
Execution time upper quantile : 212.158227 ns (97.5%)
WARNING: Final GC required 31.98787196145191 % of runtime
Evaluation count : 2423634 in 6 samples of 403939 calls.
Execution time mean : 248.822429 ns
Execution time std-deviation : 5.108509 ns
Execution time lower quantile : 243.991806 ns ( 2.5%)
Execution time upper quantile : 255.985967 ns (97.5%)
nil
user> (let [m1 (apply sorted-map (interleave (range 1024) (range 1024)))
m2 (apply s/sorted-map (interleave (range 1024) (range 1024)))]
(c/quick-bench (assoc m1 128 :foo))
(c/quick-bench (assoc m2 128 :foo)))
WARNING: Final GC required 32.16487016906748 % of runtime
Evaluation count : 998310 in 6 samples of 166385 calls.
Execution time mean : 599.396595 ns
Execution time std-deviation : 32.981185 ns
Execution time lower quantile : 575.236121 ns ( 2.5%)
Execution time upper quantile : 642.289762 ns (97.5%)
WARNING: Final GC required 31.884187557043393 % of runtime
Evaluation count : 897798 in 6 samples of 149633 calls.
Execution time mean : 687.851624 ns
Execution time std-deviation : 40.570811 ns
Execution time lower quantile : 664.063322 ns ( 2.5%)
Execution time upper quantile : 744.437622 ns (97.5%)
nil

Paul Stadig

unread,
Jan 19, 2013, 6:13:39 AM1/19/13
to cloju...@googlegroups.com
I have also been experimenting with implementing Clojure's
abstractions and data structures in Clojure.

http://github.com/pjstadig/clojure-clojure/

I started with Cons and PersistentList. I've also been working from
the Java code as opposed to the ClojureScript code. It sounds like
you've been much more sensible. :)

One issue I ran into is that there is no unsigned-bit-shift-right
(http://dev.clojure.org/jira/browse/CLJ-827), and unsigned bit
shifting is used in PersistentVector, PersistentHashMap, and in the
hasheq implementation for the integer types. I was able to hack
together a version of unsigned-bit-shift-right in Clojure code
(https://github.com/pjstadig/clojure-clojure/blob/master/src/name/stadig/clojure_clojure.clj#L64).
However, I didn't end up using it because at the moment I just fall
though to RT/hasheq for the integer types and I haven't yet
implemented PV or PHM.

The second issue that I ran into is that a couple of classes
(IteratorSeq and SeqIterator) implement a writeObject method that
throws an exception to indicate they cannot be serialized. The JVM
requires that this be a private method. I'm not really sure how to
work around this.

That's just an FYI. As I said it is probably more realistic to start
with the sorted collections, so I support your efforts. It might even
be useful to bring the sorted collections into a contrib lib and out
of core, since that could allow for a (slightly?) smaller basic
runtime for Clojure. Though that would be a breaking change for anyone
using the sorted collections, so maybe it's not worth it.


Paul
> --
> You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>

Michał Marczyk

unread,
Jan 19, 2013, 11:00:07 PM1/19/13
to cloju...@googlegroups.com
On 19 January 2013 12:13, Paul Stadig <pa...@stadig.name> wrote:
> I have also been experimenting with implementing Clojure's
> abstractions and data structures in Clojure.
>
> http://github.com/pjstadig/clojure-clojure/
>
> I started with Cons and PersistentList. I've also been working from
> the Java code as opposed to the ClojureScript code. It sounds like
> you've been much more sensible. :)

Well, I've been working from the Java code when implementing
ClojureScript's maps and transients. I've procrastinated a bit on
porting the final impl back to the JVM, but it was always a TODO item.
:-)

> One issue I ran into is that there is no unsigned-bit-shift-right
> (http://dev.clojure.org/jira/browse/CLJ-827), and unsigned bit
> shifting is used in PersistentVector, PersistentHashMap, and in the
> hasheq implementation for the integer types. I was able to hack
> together a version of unsigned-bit-shift-right in Clojure code
> (https://github.com/pjstadig/clojure-clojure/blob/master/src/name/stadig/clojure_clojure.clj#L64).
> However, I didn't end up using it because at the moment I just fall
> though to RT/hasheq for the integer types and I haven't yet
> implemented PV or PHM.

Right. I've included a bit-shift-right-zero-fill in the original PHM
patch for ClojureScript. That being said, at least in PV it doesn't
really matter, as we're only ever shifting nonnegative numbers. gvec
uses bit-shift-right (and so does flexvec).

> The second issue that I ran into is that a couple of classes
> (IteratorSeq and SeqIterator) implement a writeObject method that
> throws an exception to indicate they cannot be serialized. The JVM
> requires that this be a private method. I'm not really sure how to
> work around this.
>
> That's just an FYI. As I said it is probably more realistic to start
> with the sorted collections, so I support your efforts. It might even
> be useful to bring the sorted collections into a contrib lib and out
> of core, since that could allow for a (slightly?) smaller basic
> runtime for Clojure. Though that would be a breaking change for anyone
> using the sorted collections, so maybe it's not worth it.

Right, and thanks!

Cheers,
Michał

Stuart Halloway

unread,
Feb 2, 2013, 6:11:08 PM2/2/13
to cloju...@googlegroups.com
Hi Paul,

I upvoted http://dev.clojure.org/jira/browse/CLJ-827 based on this discussion.  Also linked your email from the ticket -- It is super helpful to have the context "I was trying to do X" along with feature proposals.

Thanks,
Stu
Reply all
Reply to author
Forward
0 new messages