Zippers - Functional Tree Editing

78 views
Skip to first unread message

Rich Hickey

unread,
Feb 25, 2008, 3:23:24 PM2/25/08
to Clojure
I've added (in SVN as of rev. 698, not yet in release) purely
functional, generic tree walking and editing, using a technique called
a zipper. For background, see the paper by Huet. A zipper is a data
structure representing a location in a hierarchical data structure,
and the path it took to get there. It provides down/up/left/right
navigation, and localized functional 'editing', insertion and removal
of nodes. With zippers you can write code that looks like an
imperative, destructive walk through a tree, call root when you are
done and get a new tree reflecting all the changes, when in fact
nothing at all is mutated - it's all thread safe and shareable! The
next function does a depth-first walk, making for easy to understand
loops:

(load-file "/Users/rich/dev/clojure/src/zip.clj")
(refer 'zip)
(def data '[[a * b] + [c * d]])
(def dz (vector-zip data))
;find the second *
(-> dz down right right down right node)
-> *


;remove the first 2 terms
(-> dz next remove next remove root)
-> [[c * d]]


;replace * with /
(loop [loc dz]
(if (end? loc)
(root loc)
(recur
(next (if (= (node loc) '*)
(replace loc '/)
loc)))))
-> [[a / b] + [c / d]]


;remove *
(loop [loc dz]
(if (end? loc)
(root loc)
(recur
(next (if (= (node loc) '*)
(remove loc)
loc)))))
-> [[a b] + [c d]]


;original is intact
(root dz)
-> [[a * b] + [c * d]]


Zipper constructors are provided for nested seqs, nested vectors, and
the xml elements generated by xml/parse. All it takes is a 4-5 line
function to support other data structures. The zipper support is in
src/zip.clj.


Try it our and let me know what you think,


Rich

jon

unread,
Feb 26, 2008, 8:52:22 AM2/26/08
to Clojure


> (def data '[[a * b] + [c * d]])
> (def dz (vector-zip data))
> ;find the second *
> (-> dz down right right down right node)
> -> *

So, does this seem like the right way to picture that?

### (this is the root node)
/
/
/
[ ### ---- + ---- ### ]
/
/
/
[ a * b ] [ c --- * d ]

Rich Hickey

unread,
Feb 26, 2008, 9:30:42 AM2/26/08
to Clojure
Pretty much (maybe you want to show the + on the last line, or leave
out [a * b] ?)

down takes you to the (location of) leftmost child -> [a * b]
right to (the location of) its right sibling -> +
right again -> (location of) [c * d]
down -> (location of) c
right -> (location of) *
node -> *

Rich

smith...@googlemail.com

unread,
Feb 26, 2008, 1:28:05 PM2/26/08
to Clojure
On Feb 25, 9:23 pm, Rich Hickey <richhic...@gmail.com> wrote:
> Try it our and let me know what you think,

From client POV it looks simply great. Implementation-wise I wonder if
this metadata thing is getting out of hand. If I read something like:

(defn zipper [branch? children make-node root]
#^{:zip/branch? branch? :zip/children children :zip/make-node make-
node} [root nil])

then it looks like objects re-invented in order to being able to
associate data and behavior. But ok. I just wonder if this is what
metadata is meant for?

- John

Rich Hickey

unread,
Feb 26, 2008, 3:48:23 PM2/26/08
to Clojure


On Feb 26, 1:28 pm, "smith49...@googlemail.com"
It's a fair question, and I think many people come to Clojure and
wonder 'where are the classes and how do I decompose problems the way
I used to?', so is worth looking at in more detail.

I've done a lot of OO programming and I vastly prefer building a
composition-driven protocol from supplied function objects to
inheritance, and have since the early '90s when programming in C++.
See http://www.google.com/search?q=rich%20hickey%20callbacks.

In this case specifically, tree navigation cannot be a function of the
'type' of the nodes in the tree, or at least to do so would be
limiting. Consider a tree of file/directories. In one scenario you
might want to consider empty directories branches and in another you
might not. Or people might use nested seqs where the children are the
second rest.

It would also be wrong to intrude on the nodes or their types by
requiring they be tagged in some manner.

So multimethods on nodes are out. We're down to taking an OO approach
to zipper locations themselves. The traditional way would be to define
Zipper (locations) as a class of some sort, with abstract functions
for branch?, children, and make-node, with the expectation that people
would define SeqZipper, VectorZipper and XMLZipper etc as derived
classes, each overriding those methods. At this point the problems of
a correct implementation of concrete derivation come to the fore.
First, in any language such as Java where all methods are by default
virtual, you would have to mark all methods other than branch?,
children, and make-node (i.e. up, down, et al) as final, in order to
prevent corruption of the core logic by derivees. Then, both Zipper
and the derivees must correctly implement a cloning strategy, as the
locations are treated as values, and need to be constructed from
scratch, copied etc. Never mind the loss of all that nice
destructuring once our locations are no longer vectors but instances
of some derivee of unknown structure. I imagine the resulting
implementation to be 4x as large.

Most critically, these classes would be broken abstractions because
branch?, children, and make-node really are just functions of nodes,
and have nothing to do with locations.

Ok, so let's separate branch?, children, and make-node and make an
interface, say ZipperNodeManager, containing only those three
functions (which would now take nodes), to be implemented by
consumers, an instance of which is passed to zipper. At that point you
aren't far from the current implementation - the node manager is
essentially a piece of (functional) data that must accompany the
creation of the root location, and be shared by all locations in the
tree. I think it would be unlikely, in any OO language, for it to end
up as succinct as:

(defn vector-zip [root]
(zipper vector? seq (fn [node children] (apply vector children))
root))

All that's left then to consider is the use of metadata to store the
associated functions. Can functions be metadata? To the extent
functions are first-class values, certainly. Is this really 'meta'? I
think so, it is neither the node data nor the location data, rather it
is about (the use of) the node data. Could a map of these three
functions have been the third item in the location vector? Sure, at
the cost of an additional slot per location. Since metadata slots are
always there (bought and paid for), using metadata is a slight space
optimization.

Is this roll-your own OO? Maybe. Is it therefore bad? Definitely not.
It is just as much 'OO' as this problem needs, and being just enough,
doesn't incur any accidental complexity. This is a huge point - when
people pull out the stock OO tool they are usually incurring a vast
amount of accidental complexity they no longer even recognize as
such.

Finally, roll-your-own or not, it is less work and code than normal
use of an OO system.

Is this the intended use of metadata? To the extent it allows the
pairing of related information and a core data structure (I started
with 'a location is a pair of a node and a path' and stuck with it)
and continued use of the core data structure in the implementation, I
think, in the case, yes. But in all cases metadata will be an
implementation/application detail. I'm sure it will be misused, but I
don't think that has happened here.

Rich

Steve Harris

unread,
Feb 26, 2008, 6:22:09 PM2/26/08
to Clojure
Rich,
looks really neat, I'm sure it will be great for processing XML data
structures and a lot more.

Now take a break for a year or so, and let us all catch up...

Stuart Sierra

unread,
Feb 27, 2008, 2:00:48 PM2/27/08
to Clojure
I've always wanted a tree-walker that worked this way. I need to play
with it some more to get used to the style, but definitely a cool
addition. Thanks, Rich!

-Stuart

Stephen C. Gilardi

unread,
May 27, 2008, 2:03:41 AM5/27/08
to clo...@googlegroups.com
On Feb 25, 2008, at 3:23 PM, Rich Hickey wrote:

> Zipper constructors are provided for nested seqs, nested vectors, and
> the xml elements generated by xml/parse. All it takes is a 4-5 line
> function to support other data structures. The zipper support is in
> src/zip.clj.
>
>

> Try it out and let me know what you think,

After coming up with an immutable nested data structure of my own
using a mixture of struct maps, sets, maps, and other (Java)
containers, I started thinking about how I could (effectively) change
something within it. It was then that I understood clearly the
problem that zip.clj solves. I like it.

It would be nice if there were a way to manipulate an arbitrary
clojure nested data structure using a zipper and, after retrieving the
root, have something with (except perhaps for some intentional
editing) the exact same structure as the original. seq-zip almost
does that, but it has the unfortunate property that it turns
everything into lists on the way down and doesn't reverse that on the
way up: A struct-map becomes a list of map entries and stays that way.

It occurs to me that that's a solvable problem. We would need a
function to pass as zipper's make-node that would create a new empty
instance of the exact same type as the parent node and add a seq of
its children to it. One might call this "deseq":

(defn deseq
"Given an exemplar and a seq of values, creates a new instance
of the exact type of the exemplar with the values as contents"
[exemplar values]
...)

The invariants would be:

For any instance of IPersistentCollection coll:

(= (deseq coll (seq coll)) coll)
(= (class (deseq coll (seq coll))) (class coll))

(Ideally for a struct map, the empty copy of the exemplar would only
have the basis's keys in it as well.)

Here's a sketch of a "mostly correct, but difficult to maintain"
definition of deseq:

(import '(clojure.lang PersistentArrayMap PersistentHashMap
PersistentHashSet PersistentList
PersistentQueue PersistentStructMap
PersistentTreeMap PersistentTreeSet
PersistentVector))

(defn deseq
"Given an exemplar and a seq of values, creates a new
instance
of the exact type of the exemplar with the values as
contents"
[exemplar values]
(let [empty-clone
(cond (instance? PersistentArrayMap exemplar)
(. PersistentArrayMap EMPTY)
(instance? PersistentHashMap exemplar)
(. PersistentHashMap EMPTY)
(instance? PersistentHashSet exemplar)
(. PersistentHashSet EMPTY)
;; (instance? PersistentList exemplar) ; for
List, we can return the values seq directly
;; (. PersistentList EMPTY)
(instance? PersistentQueue exemplar)
(. PersistentQueue EMPTY)
(instance? PersistentStructMap exemplar)
(struct (apply create-struct (keys exemplar)))
(instance? PersistentTreeMap exemplar)
(. PersistentTreeMap EMPTY)
(instance? PersistentTreeSet exemplar)
(. PersistentTreeSet EMPTY)
(instance? PersistentVector exemplar)
(. PersistentVector EMPTY))]
(if empty-clone
(reduce conj empty-clone values)
values)))

I think a generic zipper (gen-zip ?) that worked like seq-zip, but used

(deseq node children)

as its make-node function would be a useful addition to clojure.

I have an idea for a more maintainable and correct implementation of
deseq. Subclasses of IPersistentCollection could be required to
provide either a "deseq" method directly (in addition to "seq") or
provide an "empty-clone" method that would make a clojure
implementation of deseq compact and readable.

Any thoughts?

--Steve

Rich Hickey

unread,
May 27, 2008, 8:10:16 AM5/27/08
to Clojure


On May 27, 2:03 am, "Stephen C. Gilardi" <scgila...@gmail.com> wrote:
> On Feb 25, 2008, at 3:23 PM, Rich Hickey wrote:
>
> > Zipper constructors are provided for nested seqs, nested vectors, and
> > the xml elements generated by xml/parse. All it takes is a 4-5 line
> > function to support other data structures. The zipper support is in
> > src/zip.clj.
>
> > Try it out and let me know what you think,
>
> After coming up with an immutable nested data structure of my own
> using a mixture of struct maps, sets, maps, and other (Java)
> containers, I started thinking about how I could (effectively) change
> something within it. It was then that I understood clearly the
> problem that zip.clj solves. I like it.
>
> It would be nice if there were a way to manipulate an arbitrary
> clojure nested data structure using a zipper and, after retrieving the
> root, have something with (except perhaps for some intentional
> editing) the exact same structure as the original. seq-zip almost
> does that, but it has the unfortunate property that it turns
> everything into lists on the way down and doesn't reverse that on the
> way up: A struct-map becomes a list of map entries and stays that way.
>

Well, it is seq-zip. It's not an unfortunate property or a problem,
it's the expected behavior. You just want something else - poly-zip or
something. But as you've seen the zippers are set up to take a node-
constructing function for exactly that purpose.

> It occurs to me that that's a solvable problem. We would need a
> function to pass as zipper's make-node that would create a new empty
> instance of the exact same type as the parent node and add a seq of
> its children to it. One might call this "deseq":
>
> (defn deseq
> "Given an exemplar and a seq of values, creates a new instance
> of the exact type of the exemplar with the values as contents"
> [exemplar values]
> ...)
>
> The invariants would be:
>
> For any instance of IPersistentCollection coll:
>
> (= (deseq coll (seq coll)) coll)
> (= (class (deseq coll (seq coll))) (class coll))
>
> (Ideally for a struct map, the empty copy of the exemplar would only
> have the basis's keys in it as well.)
>

> I have an idea for a more maintainable and correct implementation of
> deseq. Subclasses of IPersistentCollection could be required to
> provide either a "deseq" method directly (in addition to "seq") or
> provide an "empty-clone" method that would make a clojure
> implementation of deseq compact and readable.
>
> Any thoughts?
>

I don't like the name deseq. But I agree, all that is needed is for
collections to implement empty(), which you could then use with
'into'.

(into (empty coll) new-stuff)

I'll look into empty. (Aren't languages funny?)

Rich

Rich Hickey

unread,
May 27, 2008, 9:02:34 AM5/27/08
to Clojure
empty is up

Rich

Stephen C. Gilardi

unread,
May 29, 2008, 2:47:52 PM5/29/08
to clo...@googlegroups.com

On May 27, 2008, at 8:10 AM, Rich Hickey wrote:
I don't like the name deseq. But I agree, all that is needed is for
collections to implement empty(), which you could then use with
'into'.

(into (empty coll) new-stuff)

I'll look into empty. (Aren't languages funny?)

Languages are great!  :-)

Thanks very much for adding empty.  It's been very useful.

--Steve

Reply all
Reply to author
Forward
0 new messages