As I thought I understood it, Lisp union implements set union. So, I thought that
(union '(a a) '(a b))
would return
(a b)
And, in fact, it does. This led me to think that
(union '(a a) '(b))
would also return
(A B)
However, it returns:
(A A B)
Why does the fact that there is one less 'a in the second argument in the second call mean that there is one more 'a in the result? This is definitely not set-theoretic union. Why is the behavior of Lisp union defined this way?
Here is the sequence of calls again:
[1]> (union '(a a) '(a b)) (A B) [2]> (union '(a a) '(b)) (A A B)
> As I thought I understood it, Lisp union implements set union. So, I > thought that
> (union '(a a) '(a b))
> would return
> (a b)
> And, in fact, it does. This led me to think that
> (union '(a a) '(b))
> would also return
> (A B)
> However, it returns:
> (A A B)
> Why does the fact that there is one less 'a in the second argument in > the second call mean that there is one more 'a in the result? This is > definitely not set-theoretic union. Why is the behavior of Lisp union > defined this way?
> Here is the sequence of calls again:
> [1]> (union '(a a) '(a b)) > (A B) > [2]> (union '(a a) '(b)) > (A A B)
It turns out that if there are duplicate entries in either of the two lists, the behavior is undefined as to whether there should be duplicate entries in the union:
If there is a duplication between list-1 and list-2, only one of the duplicate instances will be in the result. If either list-1 or list-2 has duplicate entries within it, the redundant entries might or might not appear in the result. [my emphasis]
The fact that they leave the behavior of union in this case undefined in the standard is surprising to me. The fact that SBCL and GNU CLISP implement it the way they do is just bizarre to me.
Paul Reiners <paul.rein...@gmail.com> writes: > Why does the fact that there is one less 'a in the second argument in > the second call mean that there is one more 'a in the result? This is > definitely not set-theoretic union. Why is the behavior of Lisp union > defined this way?
because UNION expects sets as input, and '(A A) is not a set.
> Here is the sequence of calls again:
> [1]> (union '(a a) '(a b)) > (A B)
here, the daemons that flew out of your nose were benevolent.
Michael Livshin <use...@cmm.kakpryg.net> writes: > Paul Reiners <paul.rein...@gmail.com> writes:
>> Why does the fact that there is one less 'a in the second argument in >> the second call mean that there is one more 'a in the result? This is >> definitely not set-theoretic union. Why is the behavior of Lisp union >> defined this way?
> because UNION expects sets as input, and '(A A) is not a set.
Indeed. '(A A) reads a form. It evaluates to something that prints as: (A A) and that is a list. It just happens that CL uses lists to _represent_ sets.
The two lists (A A B) and (A B) are obviously different. But (A B) and (B A) are also different lists.
However, these three lists (A A B), (A B) and (B A) are all representants of the _same_ set: {A, B}.
Note that in mathematics, {a,a,b}={a,b}={b,a}, since we also write _representations_ of mathematical sets.
On Mar 26, 9:45 am, p...@informatimago.com (Pascal J. Bourguignon) wrote:
> Paul Reiners <paul.rein...@gmail.com> writes: > (defun set-equalp (x y) (and (subsetp x y) (subsetp y x))) > (set-equalp '(a a b) '(a b)) --> true
> > This is > > definitely not set-theoretic union.
> Not at all. This is definitely set-theoretic. You have to distinguish > the sets from their representations.
Yes, you're right. If I'm willing to accept '(a a) as representing the set '(a) when I use it as an input to union, I should be just as willing to accept '(a a b) as representing the set '(a b) when I receive it back from union.
Paul Reiners <paul.rein...@gmail.com> writes: > As I thought I understood it, Lisp union implements set union. So, I > thought that
> (union '(a a) '(a b)) > (union '(a a) '(b))
These are illegal inputs to UNION. Union is allowed to assume that the inputs to it are proper sets, so you are not supposed to pass '(a a) as an argument. Internally, the algorithm can do whatever it wants to process the union based on the assumption that it has gotten sets. It does not need to check for duplicates in its input.
One could imagine a simple algorithm that tries to be a little bit clever by starting with the larger set and then only testing the smaller set for membership. Perhaps that is what your implementation does.
-- Thomas A. Russ, USC/Information Sciences Institute
>> As I thought I understood it, Lisp union implements set union. So, I >> thought that
>> (union '(a a) '(a b)) >> (union '(a a) '(b))
> These are illegal inputs to UNION. Union is allowed to assume that the > inputs to it are proper sets, so you are not supposed to pass '(a a) as > an argument.
Absolutely not.
CLHS is extremely clear about that. Either input list can contain duplicates, and CLHS explicitely specifies that in this case the result may contain duplicates or not.
> Internally, the algorithm can do whatever it wants to > process the union based on the assumption that it has gotten sets.
There is no specific assumption. CL:UNION is merely specified to return:
"a list that contains every element that occurs in either list-1 or list-2."
> It does not need to check for duplicates in its input.
Indeed, as specified, it doesn't need to.
> One could imagine a simple algorithm that tries to be a little bit > clever by starting with the larger set and then only testing the smaller > set for membership. Perhaps that is what your implementation does.
If you want to do set operations in Lisp -- and particularly if you want to do them efficiently on large sets -- permit me to recommend my FSet library:
It uses binary trees to do union and similar operations in linear time. It also permits structures like sets of sets, maps from sets to something else, etc. to be manipulated very elegantly.
Scott Burson wrote: > If you want to do set operations in Lisp -- and particularly if you > want to do them efficiently on large sets -- permit me to recommend my > FSet library:
> It uses binary trees to do union and similar operations in linear > time. It also permits structures like sets of sets, maps from sets to > something else, etc. to be manipulated very elegantly.
haha, sounds hot and I actually have an application for that if I decide to tear back into one of those ITA puzzles in anger.
> From: Paul Reiners <paul.rein...@gmail.com> > As I thought I understood it, Lisp union implements set union.
Only if that function is given two sets to begin with. If you give it something that's not a set (for either parameter), then what it's supposed to do is undefined. IMO if safety and debug setting is maximum and speed and space setting is zero, it *should* carefully check for Garbage In (invalid parameter) before doing what it's supposed to do, and if you gave it something that wasn't a set then it *should* signal an error and refuse to even try to return a value. If it doesn't signal an error in this case, you could complain to the company that wrote the Lisp system you're using, or you could write a wrapper function that **does** do the Garbage In check-and-signal-error. (defun paranoid-union (a b) (unless (is-set a) (error "First parameter not a set:~%~S" a)) (unless (is-set b) (error "Second parameter not a set:~%~S" a)) (union a b)) (defun is-set (s) (not (has-duplicate-elements s))) (defun has-duplicate-elements (s) ;order length^2 time (loop for (elem . tail) on s do (if (member elem tail) (return t)))) (defun has-duplicate-elements (s) ;much faster for large sets (let ((ht (make-hash-table))) ;Defaults EQL (loop for elem in s do (if (gethash elem ht) (return t) (setf (gethash elem ht) t))))) ;Exercise for reader to include keyword parameters too.
> So, I thought that > (union '(a a) '(a b)) > would ...
STOP ALREADY!!
Don't just read the name of a function and guess what it does. Instead, read the documentation for it and try to understand what it says. <http://www.lispworks.com/documentation/HyperSpec/Body/f_unionc.htm> union list-1 list-2 &key key test test-not => result-list ... If either list-1 or list-2 has duplicate entries within it, the redundant entries might or might not appear in the result. ...
(describe 'union) in CMUCL doesn't give that warning. The HyperSpec is better for new functions you haven't used before.
[1]> (union '(a a) '(a b)) (A B) [2]> (union '(a a) '(b)) (A A B) Remember the words of the HyperSpec: "the redundant entries might or might not appear in the result" You are observing exactly that behaviour!! What more could you ask for? (Well IMO it *could* have been specified as being allowed to throw an exception, signal an error, if debug/safety is high and speed/storage is low, but the wind didn't blow that way.)
Question to Kent: If anyone ever finds a mistake in the CLHS, would you edit the Web page to correct the mistake?
Followup question to Kent: Would you, like Knuth, pay $10 or $25 to anyone finding a mistake?
Not really relevant to the original poster's question, but am I the only one who ever thought that CL's implementation of set operations on top of the list data type is sad?
In my view, this is a symptom of a larger hole in the standard: there is no sorted collection datatype like a heap. If there was, then the set operations would naturally be defined in terms of it, rather than list[1].
This omission is #1 on my list of things missing from CL.
On Mar 29, 7:58 pm, Russell McManus <russell_mcma...@yahoo.com> wrote:
> Not really relevant to the original poster's question, but am I the > only one who ever thought that CL's implementation of set operations > on top of the list data type is sad?
> In my view, this is a symptom of a larger hole in the standard: there > is no sorted collection datatype like a heap. If there was, then the > set operations would naturally be defined in terms of it, rather than > list[1].
> This omission is #1 on my list of things missing from CL.
> -russ
> [1] Or perhaps in addition to the list datatype.
I have endeavored to rectify this omission. Please have a look: