Google Группы больше не поддерживают новые публикации и подписки в сети Usenet. Опубликованный ранее контент останется доступен.

Q about Lisp union

5 просмотров
Перейти к первому непрочитанному сообщению

Paul Reiners

не прочитано,
26 мар. 2008 г., 09:25:1126.03.2008
– rei...@us.ibm.com
As I thought I understood it, Lisp union implements set union. So, I
thought that

(union '(a a) '(a b))

would return

(a b)

And, in fact, it does. This led me to think that

(union '(a a) '(b))

would also return

(A B)

However, it returns:

(A A B)

Why does the fact that there is one less 'a in the second argument in
the second call mean that there is one more 'a in the result? This is
definitely not set-theoretic union. Why is the behavior of Lisp union
defined this way?

Here is the sequence of calls again:

[1]> (union '(a a) '(a b))
(A B)
[2]> (union '(a a) '(b))
(A A B)

Paul Reiners

не прочитано,
26 мар. 2008 г., 10:14:5826.03.2008

It turns out that if there are duplicate entries in either of the two
lists, the behavior is undefined as to whether there should be
duplicate entries in the union:

If there is a duplication between list-1 and list-2, only one of the
duplicate instances will be in the result. If either list-1 or list-2
has duplicate entries within it, the redundant entries might or might
not appear in the result. [my emphasis]

--- from http://www.lisp.org/HyperSpec/Body/fun_unioncm_nunion.html

This is from the ANSI Common Lisp standard.

The fact that they leave the behavior of union in this case undefined
in the standard is surprising to me. The fact that SBCL and GNU CLISP
implement it the way they do is just bizarre to me.

Michael Livshin

не прочитано,
26 мар. 2008 г., 10:16:0426.03.2008
Paul Reiners <paul.r...@gmail.com> writes:

> Why does the fact that there is one less 'a in the second argument in
> the second call mean that there is one more 'a in the result? This is
> definitely not set-theoretic union. Why is the behavior of Lisp union
> defined this way?

because UNION expects sets as input, and '(A A) is not a set.

> Here is the sequence of calls again:
>
> [1]> (union '(a a) '(a b))
> (A B)

here, the daemons that flew out of your nose were benevolent.

> [2]> (union '(a a) '(b))
> (A A B)

... and then there were only nasty ones left.

cheers,
--m

Pascal J. Bourguignon

не прочитано,
26 мар. 2008 г., 10:45:5026.03.2008
Paul Reiners <paul.r...@gmail.com> writes:

> As I thought I understood it, Lisp union implements set union. So, I
> thought that
>
> (union '(a a) '(a b))
>
> would return
>
> (a b)
>
> And, in fact, it does. This led me to think that
>
> (union '(a a) '(b))
>
> would also return
>
> (A B)
>
> However, it returns:
>
> (A A B)
>
> Why does the fact that there is one less 'a in the second argument in
> the second call mean that there is one more 'a in the result?

Why do you care?


(defun set-equalp (x y) (and (subsetp x y) (subsetp y x)))
(set-equalp '(a a b) '(a b)) --> true


> This is
> definitely not set-theoretic union.

Not at all. This is definitely set-theoretic. You have to distinguish
the sets from their representations.


Note that you just CANNOT distinguish the two sets (a a b) and (a b).
Try it:

(defun try-to-distinguish (x y)
(dolist (element x)
(assert (member element y)))
(dolist (element y)
(assert (member element x)))
(values 'undistinguishable-sets x y))

(try-to-distinguish '(a a b) '(a b))
--> UNDISTINGUISHABLE-SETS ;
(A A B) ;
(A B)


And remember that length is defined on lists, not on sets!

(defun cardinal (set) (length (remove-duplicates set)))

(= (cardinal '(a a b)) (cardinal '(a b))) --> true

> Why is the behavior of Lisp union
> defined this way?

Because this is more efficient.

CL set-functions represent sets as _lists_ of (possibly duplicated)
elements.

If you want to normalize your representations of set, you can always
use REMOVE-DUPLICATES

(defun normalized-union (x y) (remove-duplicates (union x y)))


--
__Pascal Bourguignon__

Pascal J. Bourguignon

не прочитано,
26 мар. 2008 г., 11:03:1826.03.2008
Michael Livshin <use...@cmm.kakpryg.net> writes:

> Paul Reiners <paul.r...@gmail.com> writes:
>
>> Why does the fact that there is one less 'a in the second argument in
>> the second call mean that there is one more 'a in the result? This is
>> definitely not set-theoretic union. Why is the behavior of Lisp union
>> defined this way?
>
> because UNION expects sets as input, and '(A A) is not a set.

Indeed. '(A A) reads a form.
It evaluates to something that prints as: (A A) and that is a list.
It just happens that CL uses lists to _represent_ sets.

The two lists (A A B) and (A B) are obviously different.
But (A B) and (B A) are also different lists.

However, these three lists (A A B), (A B) and (B A) are all
representants of the _same_ set: {A, B}.

Note that in mathematics, {a,a,b}={a,b}={b,a}, since we also write
_representations_ of mathematical sets.

--
__Pascal Bourguignon__

Paul Reiners

не прочитано,
26 мар. 2008 г., 11:27:5926.03.2008
On Mar 26, 9:45 am, p...@informatimago.com (Pascal J. Bourguignon)
wrote:
> Paul Reiners <paul.rein...@gmail.com> writes:

> (defun set-equalp (x y) (and (subsetp x y) (subsetp y x)))
> (set-equalp '(a a b) '(a b)) --> true
>
> > This is
> > definitely not set-theoretic union.
>
> Not at all. This is definitely set-theoretic. You have to distinguish
> the sets from their representations.

Yes, you're right. If I'm willing to accept '(a a) as representing
the set '(a) when I use it as an input to union, I should be just as
willing to accept '(a a b) as representing the set '(a b) when I
receive it back from union.

John Thingstad

не прочитано,
26 мар. 2008 г., 12:22:2726.03.2008
På Wed, 26 Mar 2008 14:25:11 +0100, skrev Paul Reiners
<paul.r...@gmail.com>:

> As I thought I understood it, Lisp union implements set union. So, I

> thought that ...

The set isssue has been discussed by others.
I thought I'd mention remove-duplicates.

--------------
John Thingstad

Thomas A. Russ

не прочитано,
26 мар. 2008 г., 17:06:3526.03.2008
Paul Reiners <paul.r...@gmail.com> writes:

> As I thought I understood it, Lisp union implements set union. So, I
> thought that
>
> (union '(a a) '(a b))

> (union '(a a) '(b))

These are illegal inputs to UNION. Union is allowed to assume that the
inputs to it are proper sets, so you are not supposed to pass '(a a) as
an argument. Internally, the algorithm can do whatever it wants to
process the union based on the assumption that it has gotten sets. It
does not need to check for duplicates in its input.

One could imagine a simple algorithm that tries to be a little bit
clever by starting with the larger set and then only testing the smaller
set for membership. Perhaps that is what your implementation does.

--
Thomas A. Russ, USC/Information Sciences Institute

Pascal Bourguignon

не прочитано,
26 мар. 2008 г., 18:14:4026.03.2008

t...@sevak.isi.edu (Thomas A. Russ) writes:

> Paul Reiners <paul.r...@gmail.com> writes:
>
>> As I thought I understood it, Lisp union implements set union. So, I
>> thought that
>>
>> (union '(a a) '(a b))
>> (union '(a a) '(b))
>
> These are illegal inputs to UNION. Union is allowed to assume that the
> inputs to it are proper sets, so you are not supposed to pass '(a a) as
> an argument.

Absolutely not.

CLHS is extremely clear about that. Either input list can contain
duplicates, and CLHS explicitely specifies that in this case the
result may contain duplicates or not.

> Internally, the algorithm can do whatever it wants to
> process the union based on the assumption that it has gotten sets.

There is no specific assumption. CL:UNION is merely specified to return:

"a list that contains every element that occurs in either
list-1 or list-2."

> It does not need to check for duplicates in its input.

Indeed, as specified, it doesn't need to.

> One could imagine a simple algorithm that tries to be a little bit
> clever by starting with the larger set and then only testing the smaller
> set for membership. Perhaps that is what your implementation does.

--
__Pascal Bourguignon__ http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d? s++:++ a+ C+++ UL++++ P--- L+++ E+++ W++ N+++ o-- K- w---
O- M++ V PS PE++ Y++ PGP t+ 5+ X++ R !tv b+++ DI++++ D++
G e+++ h+ r-- z?
------END GEEK CODE BLOCK------

Scott Burson

не прочитано,
26 мар. 2008 г., 18:35:0726.03.2008
If you want to do set operations in Lisp -- and particularly if you
want to do them efficiently on large sets -- permit me to recommend my
FSet library:

http://common-lisp.net/project/fset/

It uses binary trees to do union and similar operations in linear
time. It also permits structures like sets of sets, maps from sets to
something else, etc. to be manipulated very elegantly.

-- Scott

Ken Tilton

не прочитано,
26 мар. 2008 г., 18:54:5726.03.2008

haha, sounds hot and I actually have an application for that if I decide
to tear back into one of those ITA puzzles in anger.

now just keep pounding away at these yobbos. :)

c,k

--
http://smuglispweeny.blogspot.com/
http://www.theoryyalgebra.com/

"In the morning, hear the Way;
in the evening, die content!"
-- Confucius

Robert Maas, see http://tinyurl.com/uh3t

не прочитано,
27 мар. 2008 г., 07:23:1427.03.2008
> From: Paul Reiners <paul.rein...@gmail.com>

> As I thought I understood it, Lisp union implements set union.

Only if that function is given two sets to begin with.
If you give it something that's not a set (for either parameter),
then what it's supposed to do is undefined. IMO if safety and debug
setting is maximum and speed and space setting is zero, it *should*
carefully check for Garbage In (invalid parameter) before doing
what it's supposed to do, and if you gave it something that wasn't
a set then it *should* signal an error and refuse to even try to
return a value. If it doesn't signal an error in this case, you
could complain to the company that wrote the Lisp system you're
using, or you could write a wrapper function that **does** do the
Garbage In check-and-signal-error.
(defun paranoid-union (a b)
(unless (is-set a) (error "First parameter not a set:~%~S" a))
(unless (is-set b) (error "Second parameter not a set:~%~S" a))
(union a b))
(defun is-set (s)
(not (has-duplicate-elements s)))
(defun has-duplicate-elements (s) ;order length^2 time
(loop for (elem . tail) on s
do (if (member elem tail) (return t))))
(defun has-duplicate-elements (s) ;much faster for large sets
(let ((ht (make-hash-table))) ;Defaults EQL
(loop for elem in s
do (if (gethash elem ht) (return t) (setf (gethash elem ht) t)))))
;Exercise for reader to include keyword parameters too.

> So, I thought that
> (union '(a a) '(a b))

> would ...
STOP ALREADY!!

Don't just read the name of a function and guess what it does.
Instead, read the documentation for it and try to understand what it says.
<http://www.lispworks.com/documentation/HyperSpec/Body/f_unionc.htm>
union list-1 list-2 &key key test test-not => result-list
... If either list-1 or list-2 has duplicate entries within it,
the redundant entries might or might not appear in the result. ...

(describe 'union) in CMUCL doesn't give that warning. The HyperSpec
is better for new functions you haven't used before.

[1]> (union '(a a) '(a b))
(A B)
[2]> (union '(a a) '(b))
(A A B)

Remember the words of the HyperSpec:


"the redundant entries might or might not appear in the result"

You are observing exactly that behaviour!! What more could you ask for?
(Well IMO it *could* have been specified as being allowed to throw
an exception, signal an error, if debug/safety is high and
speed/storage is low, but the wind didn't blow that way.)

Question to Kent: If anyone ever finds a mistake in the CLHS, would
you edit the Web page to correct the mistake?

Followup question to Kent: Would you, like Knuth, pay $10 or $25 to
anyone finding a mistake?

Russell McManus

не прочитано,
29 мар. 2008 г., 22:58:2129.03.2008

Not really relevant to the original poster's question, but am I the
only one who ever thought that CL's implementation of set operations
on top of the list data type is sad?

In my view, this is a symptom of a larger hole in the standard: there
is no sorted collection datatype like a heap. If there was, then the
set operations would naturally be defined in terms of it, rather than
list[1].

This omission is #1 on my list of things missing from CL.

-russ


[1] Or perhaps in addition to the list datatype.

Scott Burson

не прочитано,
30 мар. 2008 г., 02:14:0630.03.2008

I have endeavored to rectify this omission. Please have a look:

http://common-lisp.net/project/fset/

-- Scott

0 новых сообщений