[ANN] Incremental Vectors

179 views
Skip to first unread message

Mark Engelberg

unread,
May 14, 2013, 4:29:22 AM5/14/13
to clojure
As part of the instaparse library, I created a wrapper around Clojure's vectors that keeps the hashcode updated as the vector is modified.  In scenarios where you take a vector and then hash, then modify, then hash, then modify, then hash, etc., this strategy of "incremental hashing" is essential to performance.  Otherwise, Clojure traverses every element in the modified vector to compute the new hash.

I have documented this new datatype.  It's very simple to use, simply call ivec instead of vec to create it and use it as you'd use a vector (with some obvious exceptions, for example, you can't add something unhashable, like an infinite sequence, to the vector, since it hashes the items as they are added). 
https://github.com/Engelberg/instaparse/blob/master/docs/IncrementalVectors.md

Code is here:
https://github.com/Engelberg/instaparse/blob/master/src/instaparse/incremental_vector.clj

Feel free to use it in your own projects.

Thanks to Christophe Grand who provided guidance on some of the subtleties of simulating vectors.  For example, I had a lot of confusion about equal vs equiv which he helped to clear up.

Jim

unread,
May 14, 2013, 6:56:13 AM5/14/13
to clo...@googlegroups.com
many thanks for this! :)

btw, is there any place where one can find your discussion between you
and Christophe? I'd love to know more about equiv...alternatively, do
you plan on making public what you've learned in some sort of demo/tutorial?

Jim
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient
> with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Mark Engelberg

unread,
May 14, 2013, 4:43:16 PM5/14/13
to clo...@googlegroups.com
On Tue, May 14, 2013 at 3:56 AM, Jim <jimpi...@gmail.com> wrote:
many thanks for this! :)

btw, is there any place where one can find your discussion between you and Christophe? I'd love to know more about equiv...alternatively, do you plan on making public what you've learned in some sort of demo/tutorial?

Jim

The big aha moment for me (I hope I remember this correctly) was the revelation that .equals only comes into play if your Clojure object is interoperating with Java objects and in Java-land, someone calls .equals on your object to test it for equality with something.  The built in = calls .equiv.

Some other big lessons:

1. In the most recent version of Clojure, most Clojure data structures cache their hash values.  This is great.  However, I was surprised to learn that records do not.  I discovered that if you're going to be hashing your data structure a lot, regular hash maps perform far better than records.

2. When testing a vector for equality against a collection, Clojure recognizes that if the counts are different, they can't possibly be equal.  So it quickly returns false.  By extension, I assumed that if the hashes for both collections were already precomputed and cached, that the equality process would compare those cached hashes and fail fast if not equal.  But in fact, Clojure doesn't do that -- even if the hashes are different, it inspects every element for a mismatch. 

3. Calling hash on a deeply nested data structure blows the stack, because hash is recursive.

4. The empty method is supposed to preserve metadata.

5. You might think that invoke is the way functions are called through normal function calls and applyTo is how they are processed by (apply f args), but it turns out that even regular function calls sometimes get translated to applyTo.  It is important to implement both.


Overall, the hardest part about trying to simulate a Clojure data structure is that there doesn't seem to be any comprehensive documentation about the interfaces and methods that need to be satisfied.  It takes trial and error.  And some functions are not factored well into interfaces, relying on concrete behavior (I'm looking at you, subvec and subseq!)

Even after I announced IncrementalVector last night, I stumbled across a couple more functions that I had failed to support.  (I've fixed that now in 1.2.0-SNAPSHOT).


Reply all
Reply to author
Forward
0 new messages