[ANN+RFC] data.avl 0.0.12-alpha1 -- sliceable & splittable sorted collections

53 views
Skip to first unread message

Michał Marczyk

unread,
Mar 22, 2014, 7:39:08 PM3/22/14
to clojure
Hi,

I am pleased to announce the 0.0.12-alpha1 release of data.avl, a
Clojure Contrib library providing drop-in replacements for
Clojure(Script)'s built-in sorted maps and sets with additional
functionality including support for real (non-view) logarithmic-time
slicing, splits, nearest neighbour lookups, rank queries and
transients. Performance in comparison to the built-ins is in
expectation better for lookups and worse for non-transient "updates";
memory use is higher by a reference and two ints per node.

This release includes an important fix to the rebalancing algorithm
and a much more extensive test suite. I would strongly advise all
users of 0.0.11 to upgrade as soon as possible. The "alpha"
designation is purely to do with the new additions to the API; see
below for details.

[org.clojure/data.avl "0.0.12-alpha1"]

<dependency>
<groupId>org.clojure</groupId>
<artifactId>data.avl</artifactId>
<version>0.0.12-alpha1</version>
</dependency>

org.clojure:data.avl:0.0.12-alpha1

As mentioned above, this is an alpha release because I would like to
solicit comments on several new additions to the API:

(require '[clojure.data.avl :as avl])

(avl/nearest (avl/sorted-set 0 1 2) < 2)
;= 1

(avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 1 < 4)
;= #{1 2 3}

(avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5))
;= [#{0 1} #{2 3 4 5}]

(avl/split-key 3 (avl/sorted-set 0 2 4 6))
;= [#{0 2} nil #{4 6}]

All of these functions operate in time logarithmic in the size of
their inputs. avl/subrange returns a data.avl collection of the same
type as its input and completely independent from it -- that is, it
shares structure with the input where possible, but does not hold on
to any parts of the tree containing keys that should not be present in
the output. avl/split-at and avl/split-key return similarly
independent data.avl collections; the middle element of the vector
returned by split-key is the element at the split key if present in
the input, nil otherwise.

"split" is the name usually applied in the literature to the operation
that splits a tree into a "left" tree, a middle element and a "right"
tree. "subrange" could also be called "slice". The split-* functions
take their collection argument last to be consistent with
clojure.core's split-at and split-with. Arguments accepted by subrange
are analogous to those taken by subseq/rsubseq.

I would appreciate any and all comments regarding this API.

Many thanks to the two new contributors to this release (listed here
in chronological order of ticket creation): Pepijn de Vos, who
prompted me to develop the above-mentioned new features by creating
the ticket asking for what is now avl/nearest and mentioning
java.util.Navigable{Map,Set}, and who provided new tests for
avl/nearest; and Andy Fingerhut, who kept on top of the hashing
changes and provided the initial implementation of new-style hasheq
for data.avl with further tests.

Cheers,
Michał
Reply all
Reply to author
Forward
0 new messages