The unshared part of two mostly-shared structures

mikel

unread,

Mar 21, 2009, 12:23:13 PM3/21/09

to Clojure

Clojure efficiently shares structure of composite objects. For
example, given a Map with 10,000 entries, you can inexpensively
created a new Map with one changed entry, because Clojure reuses the
other 9999 entries.

Is there a convenient and efficient API that can return just the
changed entry?

If there is, it would be very convenient indeed for one of my
application's needs: it would enable me to efficiently copy deltas
from processes on one machine to processes on another. As a process
updates application state, I could accrue deltas and send just the
changed portion of the state across the network to the remote process,
which, being in Map form, would be very easy to merge with the remote
state.

I haven't looked at Clojure at all with this in mind, but I know I'm
soon going to need to implement this updating mechanism. If you know
how to efficiently obtain just the diffrences between two otherwise
shared Maps, I'd be interested to learn how to do it.

Chouser

unread,

Mar 21, 2009, 5:01:09 PM3/21/09

to clo...@googlegroups.com

On Sat, Mar 21, 2009 at 12:23 PM, mikel <mev...@mac.com> wrote:
>
> I haven't looked at Clojure at all with this in mind, but I know I'm
> soon going to need to implement this updating mechanism. If you know
> how to efficiently obtain just the diffrences between two otherwise
> shared Maps, I'd be interested to learn how to do it.

This is interesting to me. I've thought for a while that I wanted to
be notified of changes to a collection, such as in a Ref's watcher.
But I couldn't think of a good way to go about it, since the new value
of a Ref may have no relationship at all with the old value.

But your question suggests a different approach. Efficiently
obtaining a "diff" that describes the differences between two
collections would suffice for most of my use cases.o

The collections don't support this now, as far as I know, but off the
top of my head it seems like it would be possible for any
collection instance to walk itself at the same time as another
instance of the exact same type, following only nodes that are not
identical references. This wouldn't work to compare an array-map with
a hash-map, for example, but in an ideal case of two hash-maps with a
single changed entry, it seems like it could be very efficient.
Perhaps O(n) where n is the number of changed leaf nodes?

I'm not sure what the resulting "diff" would look like. It would be
nice if it were simple another hash-map: (diff a b) would return a map
c such that (merge a c) would produce a value equal to b again. But
for course this can't be as 'merge' never removes an entry. An option
that would work (but doesn't strike me as terribly elegant) would be
for (diff a b) to return [c d], such that:

(= b (merge (apply dissoc a c) d))

--Chouser

rapido

unread,

Mar 21, 2009, 6:01:33 PM3/21/09

to Clojure

i have done some homework on this problem :)

(warning: not clojure specific)

my claim is that diffing two similar sets (or maps) *can* be made
efficient only *if* you can add arbitrary items to sets efficiently
(i.e. O(log(n)).

remember that immutable sets are usually laid out in a tree structure.
this also means that similar sets share a lot of tree structure.

if pointer equality is allowed, the differencing of two sets can be
made O(n) where n is the number of differences.
this can be achieved by comparing (on equality) left and right (sub)
trees of two sets.

more strongly, if the (sorted) differences between two sets are
consecutive, it can be made O(log n).
and more generally, if there are (m) blocks of consecutive (n)
differences, you can get it O(m*log(n)))
(google search 'treaps')

pointer equality doesn't always cut it however:
when two disparate processes build two similar sets (or trees), their
pointers will be different and pointer equality fails.

this can be solved by replacing pointers with hashes (of recursive
tree content), effectively replacing pointer equality by hash
equality.

my programming language enchilada (www.enchiladacode.nl) has
hash=pointer equality build in.
i believe it shouldn't be to difficult to introduce some of
enchilada's internals to clojure.

- robbert.

Stuart Sierra

unread,

Mar 22, 2009, 12:26:15 PM3/22/09

to Clojure

On Mar 21, 6:01 pm, rapido <robbert.van.da...@gmail.com> wrote:
> my programming language enchilada (www.enchiladacode.nl) has
> hash=pointer equality build in.
> i believe it shouldn't be to difficult to introduce some of
> enchilada's internals to clojure.

Hi Robbert,

I imagine that hash-pointer equality would be difficult to impossible
in Clojure, since it relies on Java's hash/pointer/equality semantics.

But Enchilada looks interesting, and a hash-based data structure in
Clojure would also be interesting.

-Stuart Sierra

mikel

unread,

Mar 22, 2009, 7:25:12 PM3/22/09

to Clojure

Thanks for the comments, folks. I may be able to raw some ideas from
them. In my specific case, restrictive rules about the types of the
objects may be quite workable. For example, I don't think I'd suffer
if the state-preserving objects were all required to be hash-maps.

I'll think some more about it.

e

unread,

Mar 23, 2009, 12:06:27 AM3/23/09

to clo...@googlegroups.com

my programming language enchilada (www.enchiladacode.nl)

interesting. I was thinking today about how STM reminds me of version control.

Reply all

Reply to author

Forward