Contrib proposal: flexvec -- RRB-Tree-based confluently persistent vectors

467 views
Skip to first unread message

Michał Marczyk

unread,
Jan 19, 2013, 4:01:07 AM1/19/13
to cloju...@googlegroups.com
Hi,

I have made the initial release of flexvec, an implementation of
confluently persistent vectors for Clojure based on the Bagwell &
Rompf paper I've been working on:

https://github.com/michalmarczyk/flexvec
https://bitbucket.org/michalmarczyk/flexvec

I would like to propose this library for inclusion in contrib.

The remainder of this letter is meant as a brief introduction to the
current state of flexvec. The README provides an overview of the
available functionality, also available as the core namespace's
docstring:

(require '[flexvec.core :as fv])
(doc flexvec.core)

For those not familiar with RRB-Trees, their purpose is to provide
logarithmic time concatenation and slicing operations for a
vector-like data structure at the cost of a slight degradation in
performance while maintaining performance close to that of
PersistentVector in code which does not use these new operations.

See some basic Criterium benchmarks below (using Criterium 0.3.1); see
point 4. below for explanation of the fv/vector result. More extensive
benchmarking is one of the top TODOs.

Finally, here are some random points I'd like to make:

1. This is Clojure only for now; a ClojureScript version is definitely
on the TODO list.

2. The following are the main functions of interest:

(doc fv/subvec)
(doc fv/catvec)

3. They can be called on PersistentVector, gvec and
APersistentVector$SubVector instances, as well as on flexvec's own
RRBT-based vector type. (Non-flexvec vectors are implicitly
converted -- this involves no work beyond allocating a new
flexvec.rrbt.Vector instance with the contents of the fields taken
from the original vector.)

4. Interop with PersistentVector currently requires breaking into its
internals using a reflective crowbar. This is not particularly
fast.

5. gvec-like vectors storing unboxed primitives are supported (indeed
calling fv/subvec or fv/catvec on gvec instances returns such
primitive-enabled vectors).

6. Chunked seqs and transients are not yet supported.

7. The current catvec implementation is not at all optimized, in fact
it uses the seq library extensively. Replacing it with a new one
tuned for to performance is also on the TODO list. Also, the
current implementation does not actually use the rebalancing
heuristic described in the paper; it either does no rebalancing at
all or as much as possible. Changing this is a possibility, to be
considered once there is a proper benchmark suite in place.

8. flexvec.debug/dbg-vec prints out a textual rendition of a vector's
tree structure, including range information for relaxed nodes.

At any rate, the initial implementation does seem to work and I would
be delighted if you took it for a spin! Any comments would be greatly
appreciated.

Cheers,
Michał


I. nth

(def v1 (apply fv/vector-of :long (range 32768)))
(def v2 (apply vector-of :long (range 32768)))
(def v3 (apply fv/vector (range 32768)))
(def v4 (apply vector (range 32768)))

user> (c/quick-bench (nth v1 10000))
WARNING: Final GC required 46.90917500466828 % of runtime
Evaluation count : 17840826 in 6 samples of 2973471 calls.
Execution time mean : 34.542227 ns
Execution time std-deviation : 0.649193 ns
Execution time lower quantile : 33.868804 ns ( 2.5%)
Execution time upper quantile : 35.427878 ns (97.5%)
nil
user> (c/quick-bench (nth v2 10000))
WARNING: Final GC required 44.30737549642139 % of runtime
Evaluation count : 19195746 in 6 samples of 3199291 calls.
Execution time mean : 32.902882 ns
Execution time std-deviation : 2.561452 ns
Execution time lower quantile : 31.032188 ns ( 2.5%)
Execution time upper quantile : 36.606489 ns (97.5%)
nil
user> (c/quick-bench (nth v3 10000))
WARNING: Final GC required 46.41348481432648 % of runtime
Evaluation count : 6581706 in 6 samples of 1096951 calls.
Execution time mean : 92.340391 ns
Execution time std-deviation : 0.908233 ns
Execution time lower quantile : 90.800910 ns ( 2.5%)
Execution time upper quantile : 93.323069 ns (97.5%)

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user> (c/quick-bench (nth v4 10000))
WARNING: Final GC required 44.45846285396071 % of runtime
Evaluation count : 20830914 in 6 samples of 3471819 calls.
Execution time mean : 29.560699 ns
Execution time std-deviation : 0.625029 ns
Execution time lower quantile : 28.923561 ns ( 2.5%)
Execution time upper quantile : 30.465835 ns (97.5%)
nil


II. catvec vs. into

(def v1 (vec (range 1024)))
(def v2 (vec (range 32768)))

user> (c/quick-bench (fv/catvec v1 v2))
WARNING: Final GC required 35.76671921857969 % of runtime
Evaluation count : 41856 in 6 samples of 6976 calls.
Execution time mean : 14.896211 us
Execution time std-deviation : 840.794433 ns
Execution time lower quantile : 14.291983 us ( 2.5%)
Execution time upper quantile : 16.221864 us (97.5%)

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 14.3961 % Variance is moderately inflated by outliers
nil
user> (c/quick-bench (into v1 v2))
WARNING: Final GC required 35.05156098194928 % of runtime
Evaluation count : 1176 in 6 samples of 196 calls.
Execution time mean : 529.096542 us
Execution time std-deviation : 24.980297 us
Execution time lower quantile : 510.348429 us ( 2.5%)
Execution time upper quantile : 569.423920 us (97.5%)

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

Paul Stadig

unread,
Jan 19, 2013, 5:55:43 AM1/19/13
to cloju...@googlegroups.com
Very interesting, Michał. I have a soft spot in my heart for data
structures, so I look forward to playing with this. I do hope it could
get into contrib and/or core at some point.


Paul

Paul Stadig

unread,
Jan 19, 2013, 6:24:28 AM1/19/13
to cloju...@googlegroups.com
Also, if people are interested in confluently persistent data
structures, I've been working on an implementation of Kaplan and
Tarjan's confluently persistent deque.

http://github.com/pjstadig/deque-clojure

It has real-time constant time deque operations (push and pop on both
ends) and real-time constant time concatenation. It's still a work in
progress (I haven't actually implemented the concatenation yet). I
wrote it in Java and then ported it to Clojure, so it's kind of an
interesting comparative study. The Clojure version still needs a bit
of performance tuning.


Paul

David Nolen

unread,
Jan 19, 2013, 11:33:11 AM1/19/13
to cloju...@googlegroups.com
Sweet! +1 for RRB-Trees in contrib.



--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.


Michał Marczyk

unread,
Jan 19, 2013, 10:58:59 PM1/19/13
to cloju...@googlegroups.com
Great! Looking forward to any and all feedback. :-)

Also, I've just pushed 0.0.2 with some bug fixes.

@Paul: Thanks for mentioning your deque impl! I've been meaning to
look into Kaplan & Tarjan, will be great to examine some working code.

Cheers,
Michał

Michał Marczyk

unread,
Jan 21, 2013, 11:46:45 PM1/21/13
to cloju...@googlegroups.com
Just a quick note to say I've pushed 0.0.3 with reduce-kv and chunked
seq support (the latter using gvec's clojure.core.VecSeq for now, so
virtually no code added to flexvec itself for this; there's a minor
"pretend you're a duck" hack involved noted in a comment in the
source).

Cheers,
Michał

Stuart Halloway

unread,
Feb 2, 2013, 5:38:54 PM2/2/13
to cloju...@googlegroups.com
I would like to see these in contrib. What is a good project name?  data.??? 

Stu

Michał Marczyk

unread,
Feb 2, 2013, 6:54:55 PM2/2/13
to cloju...@googlegroups.com
On 2 February 2013 23:38, Stuart Halloway <stuart....@gmail.com> wrote:
> I would like to see these in contrib. What is a good project name? data.???

Great!

I'd quite like data.flexvec. Otherwise data.rrbt and data.rrb-tree
would be accurate, or perhaps some variation on data.vector
(rrb(t)-vector?).

Incidentally, I've just pushed out 0.0.4 with randomized tests for
catvec and subvec (as well as fixes for issues they have uncovered).
0.0.5 will probably be out very soon with a minor API adjustment
around fv/vec.

Cheers,
Michał
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure-dev...@googlegroups.com.
>
> To post to this group, send email to cloju...@googlegroups.com.
> Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Rich Hickey

unread,
Feb 4, 2013, 10:25:51 PM2/4/13
to cloju...@googlegroups.com
That's fantastic, and would be a welcome contrib - thanks!

The patch for providing access to vector internals has been pushed, and is in RC5+.

Michał Marczyk

unread,
Feb 5, 2013, 10:21:53 AM2/5/13
to cloju...@googlegroups.com
On 5 February 2013 04:25, Rich Hickey <richh...@gmail.com> wrote:
> That's fantastic, and would be a welcome contrib - thanks!

Great!

I'd be happy to move over to a contrib repo as soon as we've got a
name for it. Do you have any preference about that?

My ideas so far are data.flexvec, data.vector, data.rrb-tree,
data.rrbt. The key exported functions are meant as extensions to the
core API for vectors which should be ready to be merged into the
language proper any time this might be useful, so in that sense
something like core.vector could be ok too.

> The patch for providing access to vector internals has been pushed, and is in RC5+.

Thanks! I've pushed out flexvec 0.0.6 which takes advantage of this.
The result is a dramatic improvement to PV$Node-based flexvec vectors'
performance.

NB. there seems to be a performance issue around using flexvec vectors
with different node types in the same JVM. I'll need to look into some
output from HotSpot and run a few experiments to see what exactly is
going on. Something to do with polymorphic calls to NodeManager
methods perhaps? I was going to experiment with specialized types
anyway -- I'll bump it up on the TODO list.

Cheers,
Michał
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
> To post to this group, send email to cloju...@googlegroups.com.

Michał Marczyk

unread,
Mar 28, 2013, 7:14:24 AM3/28/13
to cloju...@googlegroups.com
Hey,

Could we move forward with the contrib process? Any ideas as to the
project namespace?

Cheers,
Michał

Stuart Halloway

unread,
Mar 28, 2013, 12:27:09 PM3/28/13
to cloju...@googlegroups.com
Hi Michał,

I just created https://github.com/clojure/core.rrb-vector and gave you commit access to clojure contrib projects.  Once you have the project ported over we can set up JIRA, CI builds, etc.

Thanks!
Stu

Michał Marczyk

unread,
Mar 29, 2013, 4:07:08 AM3/29/13
to cloju...@googlegroups.com
Hi Stu,

Thanks for setting me up! (core.rrb-vector then? OK.) I've moved the code over.

Cheers,
Michał

Michał Marczyk

unread,
Mar 29, 2013, 4:08:38 AM3/29/13
to cloju...@googlegroups.com
I, Michał Marczyk, give my permission to release my contributions to
flexvec under the Clojure Contributor Agreement.

Michał Marczyk

unread,
Apr 5, 2013, 10:50:18 PM4/5/13
to cloju...@googlegroups.com
On 28 March 2013 17:27, Stuart Halloway <stuart....@gmail.com> wrote:
> I just created https://github.com/clojure/core.rrb-vector and gave you
> commit access to clojure contrib projects. Once you have the project ported
> over we can set up JIRA, CI builds, etc.

I think I'm done with the porting -- hopefully I got the directory
structure and pom.xml right.

Cheers,
Michał

Sean Corfield

unread,
Apr 6, 2013, 1:16:58 AM4/6/13
to cloju...@googlegroups.com
Looks like the next steps are:
* add this to the build.clojure.org setup
* add a JIRA project for reporting bugs
* make an 0.0.1 release to maven
Someone on Clojure/core has to do most of this.

Michał what is your JIRA username? (so I can update the wiki to link
to your profile as lead for this contrib)

Thanx,
Sean
> --
> You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
> To post to this group, send email to cloju...@googlegroups.com.
> Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Michał Marczyk

unread,
Apr 6, 2013, 2:08:45 AM4/6/13
to cloju...@googlegroups.com
On 6 April 2013 07:16, Sean Corfield <seanco...@gmail.com> wrote:
> Looks like the next steps are:
> * add this to the build.clojure.org setup
> * add a JIRA project for reporting bugs
> * make an 0.0.1 release to maven
> Someone on Clojure/core has to do most of this.

Ok. I just made sure mvn test is happy. That involved adding a
dependency on the jsr166y jar with <scope>provided</scope> for running
the tests on JDK 1.6; I hope that's the right thing to do? Assuming
that's the case, I think all is ready for an initial release (probably
0.0.9, incidentally, given that flexvec reached 0.0.8?).

> Michał what is your JIRA username? (so I can update the wiki to link
> to your profile as lead for this contrib)

michalmarczyk

Thanks!

Michał

Michał Marczyk

unread,
May 7, 2013, 3:16:08 AM5/7/13
to cloju...@googlegroups.com
Hi,

Could we move forward with the JIRA + Hudson setup?

Cheers,
Michał

Christopher Redinger

unread,
May 8, 2013, 11:00:05 AM5/8/13
to cloju...@googlegroups.com

Michał Marczyk

unread,
May 8, 2013, 3:37:01 PM5/8/13
to cloju...@googlegroups.com
Thanks!

Hudson complains about "Invalid login information" though.

Sean Corfield

unread,
May 8, 2013, 5:05:44 PM5/8/13
to cloju...@googlegroups.com
Added these links to the Contrib library pages on the wiki.

Christopher Redinger

unread,
May 8, 2013, 6:05:00 PM5/8/13
to cloju...@googlegroups.com
Try the login credentials I sent you earlier again. I've fixed them up.

Michał Marczyk

unread,
May 9, 2013, 7:16:04 AM5/9/13
to cloju...@googlegroups.com
On 9 May 2013 00:05, Christopher Redinger <redi...@gmail.com> wrote:
> Try the login credentials I sent you earlier again. I've fixed them up.

Worked fine this time, thanks!

Michał Marczyk

unread,
May 9, 2013, 7:18:26 AM5/9/13
to cloju...@googlegroups.com
On 8 May 2013 23:05, Sean Corfield <seanco...@gmail.com> wrote:
> Added these links to the Contrib library pages on the wiki.

Thanks!

Michał Marczyk

unread,
May 31, 2013, 4:10:15 PM5/31/13
to cloju...@googlegroups.com
As mentioned on another thread, I think an initial release is in
order. I'm thinking of calling it 0.0.9, as flexvec got to 0.0.8 prior
to the move to Contrib. I believe that's in line with current
practice; if not, please set me straight!

Cheers,
Michał
Reply all
Reply to author
Forward
0 new messages