request for comments/code-review - clojure implementation of unifier

Kevin Livingston

unread,

Apr 19, 2010, 2:34:45 PM4/19/10

to Clojure

I ported the unifier posted by Norvig in Common Lisp to Clojure...

original lisp here: http://norvig.com/paip/unify.lisp
from the paip directory it also uses code in patmatch and auxfns
files.

this revealed some things that I don't particularly care for in
Clojure, and some things I'm clearly unsure how to correctly
implement, so I would appreciate review and feedback on this code as
an exercise, but also because it's going to go into heavy use soon.
(There are some helper functions that are just there to make porting
Norvig's code easier, also to raise the level of abstraction -
arguably I shouldn't have used contains? and had another helper like
Norvig did.)

major questions
1. this is ugly: (and (seq? x) (not (empty? x))) in common lisp you
just use consp is there no equivalent in clojure? there is (seq x)
but that barfs if it's given a symbol. The loss of the cleanness of
iterating/recursing until you hit nil is unfortunate (it made for some
elegant lisp code)

2. there are self calls and mutual calls throughout - which could
create a stack problem. advice for how to get around this?
specifically calls like the following seem hard to fix, or at least I
don't know the "duh" way to do it (I always just let my lisp compiler
do all the work for me)

;inside of occurs-check:
(or (occurs-check var (first x) bindings)
(occurs-check var (next x) bindings))

2'. I'm to understand that next is now preferred to rest? (which is an
unfortunate name at least for my lisp mind, since it seems like it
should be returning an element instead of rest - but I'll learn
eventually)

minor questions:
a. what's the preferred indenting style for cond?
b. is def the correct way to introduce constants and thread safe
places to bind over?
c. is that the correct / best / preferred way to set default values
for the optional parameter in unify?

EXAMPLES:
> (unify '(a b) '(a b))
{}
> (unify '?a '(b))
{?a (b)}
> (unify '(?a ?b) '((q) ?b))
{?a (q)}

>(unify '(?a ?b) '(b))
nil
> (unify '(?a ?b) '(?b))
nil

CODE:

(defn variable? [x]
"Is x a variable (a symbol beginning with `?')?"
(and (symbol? x)
(= (nth (name x) 0) \?)))

(def fail nil)
(def no-bindings {})
(def *occurs-check* true) ;"perform the occurs check?"

(defn extend-bindings [var val bindings]
(assoc bindings var val))

(defn lookup [var bindings]
(get bindings var))

(declare unify unify-variable occurs-check)

(defn unify
"See if x and y match with given bindings."
([x y] (unify x y no-bindings))
([x y bindings]
(cond
(= bindings fail) fail
(= x y) bindings
(variable? x) (unify-variable x y bindings)
(variable? y) (unify-variable y x bindings)
(and (seq? x) (not (empty? x)) (seq? y) (not (empty? y)))
;(and (seq x) (seq y)) ;this is consp in Norvig's code
(unify (next x) (next y)
(unify (first x) (first y) bindings))
true fail)))

(defn unify-variable [var x bindings]
"Unify var with x, using (and maybe extending) bindings."
(cond (contains? bindings var)
(unify (lookup var bindings) x bindings)
(and (variable? x) (contains? bindings x))
(unify var (lookup x bindings) bindings)
(and *occurs-check* (occurs-check var x bindings))
fail
true
(extend-bindings var x bindings)))

(defn occurs-check [var x bindings]
"Does var occur anywhere inside x?"
(cond (= var x) true
(and (variable? x) (contains? bindings x))
(occurs-check var (lookup x bindings) bindings)
(and (seq? x) (not (empty? x)))
;(seq x) ;consp would be nice here instead of the above
(or (occurs-check var (first x) bindings)
(occurs-check var (next x) bindings))
true false))

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Michał Marczyk

unread,

Apr 20, 2010, 9:43:43 PM4/20/10

to clo...@googlegroups.com

On 19 April 2010 20:34, Kevin Livingston

<kevinliving...@gmail.com> wrote:
> I ported the unifier posted by Norvig in Common Lisp to Clojure...

Cool! Before I comment on a few specific points you make / questions
you ask, here are the results of a few minutes I spent playing with /
rewriting your code:

http://gist.github.com/373303

> major questions
> 1. this is ugly: (and (seq? x) (not (empty? x))) in common lisp you
> just use consp is there no equivalent in clojure? there is (seq x)
> but that barfs if it's given a symbol. The loss of the cleanness of
> iterating/recursing until you hit nil is unfortunate (it made for some
> elegant lisp code)

I can't think of a direct equivalent now, but it's straightforward
enough to supply your own equivalent, like my compound? function
(basically #(and (seq? %) (seq %))). This won't work on arrays and
other things which seq can operate upon, but which respond with a
false to seq?, though. If that could be a problem, you can just use
seq and for non-seqables catch the exception and return false.

> 2. there are self calls and mutual calls throughout - which could
> create a stack problem. advice for how to get around this?
> specifically calls like the following seem hard to fix, or at least I
> don't know the "duh" way to do it (I always just let my lisp compiler
> do all the work for me)

You could perform a CPS transform by hand (which I haven't (yet) done
with the code in the Gist). For occurs-check this would mean having an
extra to-do argument (or maybe just [... & to-do]); whenever false
would normally be returned, you'd first check if to-do is empty,
returning false if so and doing (recur <args taken from (first to-do)>
(rest to-do)) otherwise. true should short-circuit and disregard
to-do, of course. Finally, whenever occurs-check would normally branch
on first / rest, it would recur with first *and* an extended to-do
including rest.

> 2'. I'm to understand that next is now preferred to rest? (which is an
> unfortunate name at least for my lisp mind, since it seems like it
> should be returning an element instead of rest - but I'll learn
> eventually)

Not at all. next is basically #(seq (rest %)); you might want to use
it when the extra bit of strictness is beneficial to your code and you
shouldn't use it when it isn't.

> minor questions:
> a. what's the preferred indenting style for cond?

Good question. I really prefer the fully parenthesised cond clauses
from Scheme / CL / elisp, in good part because of the nicer
indentation without particularly elaborate rules... :-(

> b. is def the correct way to introduce constants and thread safe
> places to bind over?
> c. is that the correct / best / preferred way to set default values
> for the optional parameter in unify?

If you expect to want / need to replace the default occasionally, then
sure, you could do it with a Var and (binding [...] ...). On the other
hand, unify already accepts a "bindings" parameter, so you could just
as well pass in an explicit starting value when you need to.

I think that your code would benefit from direct use of Clojure data
structure literals and such facilities as the IFn implementation
available on maps; see my gist for examples of what I have in mind.

Sincerely,
Michał

Kevin Livingston

unread,

Apr 21, 2010, 4:58:27 PM4/21/10

to Clojure

Thank you for taking the time to provide feedback.

> I can't think of a direct equivalent now, but it's straightforward
> enough to supply your own equivalent, like my compound? function
> (basically #(and (seq? %) (seq %))). This won't work on arrays and
> other things which seq can operate upon, but which respond with a
> false to seq?, though. If that could be a problem, you can just use
> seq and for non-seqables catch the exception and return false.

intentionally driving into an exception seems like a very bad idea.
but yes I can define my own equivalent of consp I was just hoping
there was something I was over looking.

> You could perform a CPS transform by hand (which I haven't (yet) done
> with the code in the Gist). For occurs-check this would mean having an
> extra to-do argument (or maybe just [... & to-do]); whenever false
> would normally be returned, you'd first check if to-do is empty,
> returning false if so and doing (recur <args taken from (first to-do)>
> (rest to-do)) otherwise. true should short-circuit and disregard
> to-do, of course. Finally, whenever occurs-check would normally branch
> on first / rest, it would recur with first *and* an extended to-do
> including rest.

That'll trade some heap for stack space, presumably worthwhile. I
haven't had to do much hand optimization of recursion so I'm a little
rusty (the Franz Allegro compiler was fairly amazing in that
regard)... I was hoping there was some even better trick (like the
trampoline) to save me even more. That's a good idea though, I'll
just to estimate how close I'm getting to blowing the stack, and the
overhead of tracking to-do.

> Not at all. next is basically #(seq (rest %)); you might want to use
> it when the extra bit of strictness is beneficial to your code and you
> shouldn't use it when it isn't.

oh, ok. I saw some posts on using the new super-lazy lists / code and
it implied that next was necessary over rest, but I didn't quite read
it in excruciating detail.

> Good question. I really prefer the fully parenthesised cond clauses
> from Scheme / CL / elisp, in good part because of the nicer
> indentation without particularly elaborate rules... :-(

that "extra" paren is there too because the body of the clause is an
implicit progn in common lisp.

> I think that your code would benefit from direct use of Clojure data
> structure literals and such facilities as the IFn implementation
> available on maps; see my gist for examples of what I have in mind.

on some issues I agree others I disagree, regarding your code:

regarding removing the helper functions, that's not necessarily ideal,
while bindings may always be a map putting in literals like {} or nil
instead of fail or no-bindings actually obfuscates the code, and makes
it harder to change later if what bindings is needs to fundamentally
change. similarly with truncating variable names, it detracts from
readability - although with one interesting caveat -- "var" is a
particularly unique as it is now a language keyword in clojure.

in general it's bad style to wrap a cond with another test if
trivially avoidable (at least this is coming from best practice common
lisp style) so for example, in your code line 24, there's no reason to
pull that out of the cond, especially since you are also using the
return value of the when, it's better to be explicit about the test
and what it's resulting return value is -- in this case that the
unification fails. the if-let's are a little more interesting as it
potentially saves one lookup, but it seems to make the code far more
"disjoint" or at least increase the number of paths.

the changing of a cond to a some with the identity function on line 48
seems like a very odd choice to me. while still correct code I see no
reason why this would be preferred to a cond.

thanks for the subst code.

I'm torn about the simplify-bindings -- I think it's a cool function
to have, because if what you are subbing into is large compared to
bindings, this is probably a win, but if you have a lot of bindings
and are only using some, you could spend a lot of time to pack them
down. it'd probably be good to have around if someone knew where they
were in the trade-off.

I left out Norvig's code, I have a need for binding/substituting with
multiple binding lists so I ended up with this (which requires
clojure.walk)

(defn recursive-replace
([expr l-bindings] (recursive-replace expr l-bindings #{expr}))
([expr l-bindings past-vals]
(let [applicable-bindings
(some (fn [b] (if (contains? b expr) b nil)) l-bindings)
bind-val (get applicable-bindings expr)]
(if (or (not applicable-bindings)
(contains? past-vals bind-val)) ;don't enter replace
loop
expr
(recur bind-val l-bindings (conj past-vals bind-val))))))

(defn subst-bindings-int [x l-bindings]
(prewalk (fn [expr] (recursive-replace expr l-bindings)) x))

(defn subst-bindings-list [x list-of-bindings]
(cond (some (partial = fail) list-of-bindings) fail
(every? (partial = no-bindings) list-of-bindings) x
:else (subst-bindings-int x list-of-bindings)))

(defn subst-bindings [x & bindings]
"Substitute the value of variables in bindings into x,
taking recursively bound variables into account."
(subst-bindings-list x bindings))

thanks again for the new ideas.

Kevin

Alex Osborne

unread,

Apr 21, 2010, 6:19:59 PM4/21/10

to clo...@googlegroups.com

Kevin Livingston <kevinliving...@gmail.com> writes:

>> Not at all. next is basically #(seq (rest %)); you might want to use
>> it when the extra bit of strictness is beneficial to your code and you
>> shouldn't use it when it isn't.
>
> oh, ok. I saw some posts on using the new super-lazy lists / code and
> it implied that next was necessary over rest, but I didn't quite read
> it in excruciating detail.

Likely what you were reading was referring to nil-punning. In very
early versions of Clojure, empty seqs were nil, so you could do:

(when some-seq
...)

Of course this has the problem that you can't have a completely lazy
seq, the first element is always evaluated (to see whether the seq is
empty or not in order to return nil). They were made fully lazy by
introducing empty seqs, thus you now need to explicitly check with one
is empty use 'seq. So next is a shorthand for doing this on rest and
replacing instances of rest with next in old code that relied on
nil-punning was a simple way to upgrade.

Michał Marczyk

unread,

Apr 21, 2010, 11:19:05 PM4/21/10

to clo...@googlegroups.com

On 21 April 2010 22:58, Kevin Livingston

<kevinliving...@gmail.com> wrote:
> Thank you for taking the time to provide feedback.

Thank you for the insightful response!

I've posted the current state of my code -- which uses clojure.zip and
clojure.walk, among other things -- here:

http://gist.github.com/374764

> intentionally driving into an exception seems like a very bad idea.
> but yes I can define my own equivalent of consp I was just hoping
> there was something I was over looking.

Right. I've aliased clojure.contrib.core/seqable? to compound? for
now... which I hadn't known about earlier. I guess this was bound to
be a solved problem. Agreed on the use of exceptions.

> That'll trade some heap for stack space, presumably worthwhile. I
> haven't had to do much hand optimization of recursion so I'm a little
> rusty (the Franz Allegro compiler was fairly amazing in that
> regard)... I was hoping there was some even better trick (like the
> trampoline) to save me even more.

I did write a version of occurs-check -- which I seem to have renamed
to occurs? at some point -- which uses the CPS approach (it's included
in the Gist, commented out); then I realised that its a natural use
case for clojure.zip (the same basic idea, but with a different
packaging). See if you like the current version.

> that "extra" paren is there too because the body of the clause is an
> implicit progn in common lisp.

And an implicit begin in Scheme, I know... Plus the Scheme version
handles unary clauses and a special kind of ternary clauses (with '=>
in the middle) in special ways, which is useful at times. However, the
reason I personally prefer to have those extra parens in cond has much
more to do with the shape of the code than those other things. :-)

> on some issues I agree others I disagree, regarding your code:

Thanks again for your remarks. A few points that came to my mind:

> regarding removing the helper functions, that's not necessarily ideal,
> while bindings may always be a map putting in literals like {} or nil
> instead of fail or no-bindings actually obfuscates the code, and makes
> it harder to change later if what bindings is needs to fundamentally
> change.

I'm in two minds w.r.t. the second point above. On the one hand, I'm
loathe to pass on using such facilities as the IFn implementation on
maps, sets and vectors in an attempt to make easier some fundamental
change which might possibly happen in the future (and who knows what
else it might involve besides what I might think of now). On the
other, I can already see a different data structure being useful for
representing bindings (more on that further down), so I'm also
disinclined to hand-wave your argument away.

However, I do believe that said different data structure could be made
to act like a map fairly transparently (hopefully through the protocol
facility; or, if some functionality's missing as of yet, reify), so
I'd still prefer to use the map-specific niceties and refactor to
remove them only when the need actual comes (and it becomes clear what
sort of changes are really needed).

> in general it's bad style to wrap a cond with another test if
> trivially avoidable (at least this is coming from best practice common
> lisp style) so for example, in your code line 24, there's no reason to
> pull that out of the cond, especially since you are also using the
> return value of the when, it's better to be explicit about the test
> and what it's resulting return value is -- in this case that the
> unification fails.

Well, I was trying to treat failure specially, as it were, but I take
your point. Changed now.

> the if-let's are a little more interesting as it
> potentially saves one lookup, but it seems to make the code far more
> "disjoint" or at least increase the number of paths.

I've included my cond-let macro in the Gist now (and refactored
unify-variable to take advantage of it); do you think it's acceptable?

> the changing of a cond to a some with the identity function on line 48
> seems like a very odd choice to me. while still correct code I see no
> reason why this would be preferred to a cond.

Ouch, I don't know what I was thinking. Changed now.

> I'm torn about the simplify-bindings -- I think it's a cool function
> to have, because if what you are subbing into is large compared to
> bindings, this is probably a win, but if you have a lot of bindings
> and are only using some, you could spend a lot of time to pack them
> down. it'd probably be good to have around if someone knew where they
> were in the trade-off.

Perhaps using some kind of local memoization would be a good
compromise? There was a cool thread on memoization here a short while
ago; see Meikel Brandmeyer's summary here:

http://kotka.de/blog/2010/03/memoize_done_right.html

I took a shot at that with subst-memo. Additionally, this function
leaves unbound variables intact, while the other versions use nil
instead. There are two other (simpler) versions of subst included in
the current gist, of which one is equivalent to my initial version,
while the other one resolves variables when the need arises (and
doesn't memoize). I've switched to prewalk for all three versions,
like you did.

As an alternative to a fancy substitution scheme, I was thinking of a
different data structure for bindings, as mentioned above... Something
like a reversible map. With reify, it should be able to make such a
beast behave as a regular IPersistentMap, but also support
lookup-of-key-by-val through a protocol, say, and use that inside
assoc to perform resolution at binding time. Not sure if that would
really be beneficial, and even if it would, I guess that it might not
be worth the trouble when memoization should handle things nicely
enough. The idea did make me less certain that all one could ever hope
to use as a bindings structure is a hash map, though. ;-)

All the best,
Michał

Reply all

Reply to author

Forward