[ANN] data.avl 0.0.14 – BUGFIX to splits, faster map/set reductions

87 views
Skip to first unread message

Michał Marczyk

unread,
Aug 22, 2016, 8:43:51 PM8/22/16
to clojure
Hi,

I am pleased to announce the 0.0.14 release of data.avl, a Clojure
Contrib library providing highly performant drop-in replacements for
Clojure(Script)'s built-in sorted maps and sets that support O(log n)
nth, rank-of, first-class submaps/subsets (like subseq, but preserving
collection type; fully independent from the original for GC purposes)
and splits by key and index.

  [org.clojure/data.avl "0.0.14"]

  <dependency>
    <groupId>org.clojure</groupId>
    <artifactId>data.avl</artifactId>
    <version>0.0.14</version>
  </dependency>

  org.clojure:data.avl:0.0.14

Changes in this release:


    Fixed a bug in split-key that caused incorrect rank and size
    information to be stored in some collections resulting from splits
    using a key absent from the input collection. See the ticket for a
    more detailed post mortem and links to fixing commits.

    Many thanks to Darrick Wiebe for the report and test.check
    properties that exposed broken split-key return values!


    data.avl maps and sets now implement CollReduce (in Clojure) /
    IReduce (in ClojureScript), leading to significantly improved
    reduce performance on data.avl inputs. See end of this email for
    some Criterium benchmarks.

    Many thanks to Francis Avila for providing the patch!

 3. Seqs over data.avl collections can now be = to j.u.Lists.

 4. data.avl collections now have their own print-dup implementations
    that correctly round-trip in Clojure.

 5. Public Vars in the clojure.data.avl namespace now carry :added
    metadata.

Cheers,
Michał


Reduce benchmarks:
==================

Before patch:
-------------

user> (let [m1 (apply sorted-map (range 10000))
            m2 (apply avl/sorted-map (range 10000))]
        (assert (= (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m1)
                   (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m2)))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m1))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m2)))
Evaluation count : 120 in 6 samples of 20 calls.
             Execution time mean : 663.862017 µs
    Execution time std-deviation : 9.483652 µs
   Execution time lower quantile : 654.277800 µs ( 2.5%)
   Execution time upper quantile : 677.885431 µs (97.5%)
                   Overhead used : 21.423458 ns
Evaluation count : 468 in 6 samples of 78 calls.
             Execution time mean : 1.295972 ms
    Execution time std-deviation : 74.233890 µs
   Execution time lower quantile : 1.255853 ms ( 2.5%)
   Execution time upper quantile : 1.420871 ms (97.5%)
                   Overhead used : 21.423458 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
 Variance from outliers : 14.4619 % Variance is moderately inflated by outliers
nil


After patch:
------------

;;; Clojure 1.9-alpha10

user> (let [m1 (apply sorted-map (range 10000))
            m2 (apply avl/sorted-map (range 10000))]
        (assert (= (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m1)
                   (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m2)))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m1))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m2)))
Evaluation count : 882 in 6 samples of 147 calls.
             Execution time mean : 687.923681 µs
    Execution time std-deviation : 17.527428 µs
   Execution time lower quantile : 669.270395 µs ( 2.5%)
   Execution time upper quantile : 710.828484 µs (97.5%)
                   Overhead used : 2.633678 ns
Evaluation count : 2940 in 6 samples of 490 calls.
             Execution time mean : 207.386184 µs
    Execution time std-deviation : 7.420049 µs
   Execution time lower quantile : 202.829682 µs ( 2.5%)
   Execution time upper quantile : 219.880774 µs (97.5%)
                   Overhead used : 2.633678 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

;;; Clojure 1.5.1

user> (let [m1 (apply sorted-map (range 10000))
            m2 (apply avl/sorted-map (range 10000))]
        (assert (= (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m1)
                   (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m2)))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m1))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m2)))
Evaluation count : 990 in 6 samples of 165 calls.
             Execution time mean : 599.205620 µs
    Execution time std-deviation : 30.389791 µs
   Execution time lower quantile : 579.132582 µs ( 2.5%)
   Execution time upper quantile : 650.985417 µs (97.5%)
                   Overhead used : 1.320107 ns

Alex Miller

unread,
Aug 22, 2016, 8:59:18 PM8/22/16
to Clojure
Nice work! Esp on the reduce stuff - great to see that. Any reason you didn't go the IReduce path in Clojure too instead of CollReduce? Generally, I'd say that's preferred when you control the data structure.

Michał Marczyk

unread,
Aug 23, 2016, 3:49:01 AM8/23/16
to clojure
All credit to Francis for the reduce patch! The perf win is really great.

Re: IReduce – thanks, indeed, I forgot about the special-casing in reduce. I guess I'll tweak that for the next release.

Incidentally, I notice I left out the final result in the after-patch 1.5.1 benchmark run in the previous email – repeated below in full.

Cheers,
Michał


After patch, Clojure 1.5.1:

user> (let [m1 (apply sorted-map (range 10000))
            m2 (apply avl/sorted-map (range 10000))]
        (assert (= (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m1)
                   (reduce (fn [out kv] (+ out (key kv) (val kv)))
                     0
                     m2)))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m1))
        (c/quick-bench
          (reduce (fn [out kv] (+ out (key kv) (val kv)))
            0
            m2)))
Evaluation count : 990 in 6 samples of 165 calls.
             Execution time mean : 599.205620 µs
    Execution time std-deviation : 30.389791 µs
   Execution time lower quantile : 579.132582 µs ( 2.5%)
   Execution time upper quantile : 650.985417 µs (97.5%)
                   Overhead used : 1.320107 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
Evaluation count : 3198 in 6 samples of 533 calls.
             Execution time mean : 191.906227 µs
    Execution time std-deviation : 3.447964 µs
   Execution time lower quantile : 188.550574 µs ( 2.5%)
   Execution time upper quantile : 197.006475 µs (97.5%)
                   Overhead used : 1.320107 ns
nil

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Francis Avila

unread,
Aug 23, 2016, 9:38:27 AM8/23/16
to Clojure
I wasn't aware Clojure had an IReduce protocol! I thought CollReduce was just what IReduce is called in Clojure.

Alex Miller

unread,
Aug 23, 2016, 9:55:34 AM8/23/16
to Clojure
IReduce is a Java interface that can be used to allow a data structure to directly indicate participation in being self-reducible. In cases where you control the type (ie you are making the data structure), implementing IReduce (or really IReduceInit) is usually better as it is checked first.

CollReduce is a protocol that provides a broader (open) extension mechanism to extend self-reducibility to other types.
Reply all
Reply to author
Forward
0 new messages