Hi,
I have made the initial release of flexvec, an implementation of
confluently persistent vectors for Clojure based on the Bagwell &
Rompf paper I've been working on:
https://github.com/michalmarczyk/flexvec
https://bitbucket.org/michalmarczyk/flexvec
I would like to propose this library for inclusion in contrib.
The remainder of this letter is meant as a brief introduction to the
current state of flexvec. The README provides an overview of the
available functionality, also available as the core namespace's
docstring:
(require '[flexvec.core :as fv])
(doc flexvec.core)
For those not familiar with RRB-Trees, their purpose is to provide
logarithmic time concatenation and slicing operations for a
vector-like data structure at the cost of a slight degradation in
performance while maintaining performance close to that of
PersistentVector in code which does not use these new operations.
See some basic Criterium benchmarks below (using Criterium 0.3.1); see
point 4. below for explanation of the fv/vector result. More extensive
benchmarking is one of the top TODOs.
Finally, here are some random points I'd like to make:
1. This is Clojure only for now; a ClojureScript version is definitely
on the TODO list.
2. The following are the main functions of interest:
(doc fv/subvec)
(doc fv/catvec)
3. They can be called on PersistentVector, gvec and
APersistentVector$SubVector instances, as well as on flexvec's own
RRBT-based vector type. (Non-flexvec vectors are implicitly
converted -- this involves no work beyond allocating a new
flexvec.rrbt.Vector instance with the contents of the fields taken
from the original vector.)
4. Interop with PersistentVector currently requires breaking into its
internals using a reflective crowbar. This is not particularly
fast.
5. gvec-like vectors storing unboxed primitives are supported (indeed
calling fv/subvec or fv/catvec on gvec instances returns such
primitive-enabled vectors).
6. Chunked seqs and transients are not yet supported.
7. The current catvec implementation is not at all optimized, in fact
it uses the seq library extensively. Replacing it with a new one
tuned for to performance is also on the TODO list. Also, the
current implementation does not actually use the rebalancing
heuristic described in the paper; it either does no rebalancing at
all or as much as possible. Changing this is a possibility, to be
considered once there is a proper benchmark suite in place.
8. flexvec.debug/dbg-vec prints out a textual rendition of a vector's
tree structure, including range information for relaxed nodes.
At any rate, the initial implementation does seem to work and I would
be delighted if you took it for a spin! Any comments would be greatly
appreciated.
Cheers,
Michał
I. nth
(def v1 (apply fv/vector-of :long (range 32768)))
(def v2 (apply vector-of :long (range 32768)))
(def v3 (apply fv/vector (range 32768)))
(def v4 (apply vector (range 32768)))
user> (c/quick-bench (nth v1 10000))
WARNING: Final GC required 46.90917500466828 % of runtime
Evaluation count : 17840826 in 6 samples of 2973471 calls.
Execution time mean : 34.542227 ns
Execution time std-deviation : 0.649193 ns
Execution time lower quantile : 33.868804 ns ( 2.5%)
Execution time upper quantile : 35.427878 ns (97.5%)
nil
user> (c/quick-bench (nth v2 10000))
WARNING: Final GC required 44.30737549642139 % of runtime
Evaluation count : 19195746 in 6 samples of 3199291 calls.
Execution time mean : 32.902882 ns
Execution time std-deviation : 2.561452 ns
Execution time lower quantile : 31.032188 ns ( 2.5%)
Execution time upper quantile : 36.606489 ns (97.5%)
nil
user> (c/quick-bench (nth v3 10000))
WARNING: Final GC required 46.41348481432648 % of runtime
Evaluation count : 6581706 in 6 samples of 1096951 calls.
Execution time mean : 92.340391 ns
Execution time std-deviation : 0.908233 ns
Execution time lower quantile : 90.800910 ns ( 2.5%)
Execution time upper quantile : 93.323069 ns (97.5%)
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user> (c/quick-bench (nth v4 10000))
WARNING: Final GC required 44.45846285396071 % of runtime
Evaluation count : 20830914 in 6 samples of 3471819 calls.
Execution time mean : 29.560699 ns
Execution time std-deviation : 0.625029 ns
Execution time lower quantile : 28.923561 ns ( 2.5%)
Execution time upper quantile : 30.465835 ns (97.5%)
nil
II. catvec vs. into
(def v1 (vec (range 1024)))
(def v2 (vec (range 32768)))
user> (c/quick-bench (fv/catvec v1 v2))
WARNING: Final GC required 35.76671921857969 % of runtime
Evaluation count : 41856 in 6 samples of 6976 calls.
Execution time mean : 14.896211 us
Execution time std-deviation : 840.794433 ns
Execution time lower quantile : 14.291983 us ( 2.5%)
Execution time upper quantile : 16.221864 us (97.5%)
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 14.3961 % Variance is moderately inflated by outliers
nil
user> (c/quick-bench (into v1 v2))
WARNING: Final GC required 35.05156098194928 % of runtime
Evaluation count : 1176 in 6 samples of 196 calls.
Execution time mean : 529.096542 us
Execution time std-deviation : 24.980297 us
Execution time lower quantile : 510.348429 us ( 2.5%)
Execution time upper quantile : 569.423920 us (97.5%)
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil