Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Py2.3: Feedback on Sets (fwd)

0 views
Skip to first unread message

David Mertz

unread,
Aug 16, 2003, 12:26:22 PM8/16/03
to
> * Is there a compelling need for additional set methods like
> Set.powerset() and Set.isdisjoint(s) or are the current
> offerings sufficient?

I confess that I have not used sets for anything beyond testing. I love
the concept, but I just haven't had the need yet (especially in
something where I want to require 2.3).

The mention of Set.powerset() above is quite interesting to me. It
feels both exciting and dangerous :-).

As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
Seems like it would be really easy to run into some long runtimes and
memory usage. Then again, power set really is a fundamental operation
on sets. And making users rewrite it each time is error prone.

So I think I would advocate it, but with a fairly harsh warning in the
documentation about complexity issues.

Yours, David...

--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
.cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
X-Shameless-Plug: Buy Text Processing in Python: http://tinyurl.com/jskh


Aahz

unread,
Aug 16, 2003, 1:44:06 PM8/16/03
to
In article <mailman.1061051286...@python.org>,

David Mertz <me...@gnosis.cx> wrote:
>
>The mention of Set.powerset() above is quite interesting to me. It
>feels both exciting and dangerous :-).
>
>As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
>Seems like it would be really easy to run into some long runtimes and
>memory usage. Then again, power set really is a fundamental operation
>on sets. And making users rewrite it each time is error prone.

Maybe make it an iterator? <0.3 wink>
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz

Gregor Lingl

unread,
Aug 16, 2003, 2:58:59 PM8/16/03
to
David Mertz schrieb:

>>* Is there a compelling need for additional set methods like
>> Set.powerset() and Set.isdisjoint(s) or are the current
>> offerings sufficient?
>
>
> I confess that I have not used sets for anything beyond testing. I love
> the concept, but I just haven't had the need yet (especially in
> something where I want to require 2.3).
>
> The mention of Set.powerset() above is quite interesting to me. It
> feels both exciting and dangerous :-).

I think it would be more general and versatile to provide a method for
the Cantorproduct of two sets, set1.cantorproduct(set2), which could be
used to define Set.powerset.
Additionally implementing this as __mul__ and __pow__ methods, so you
could write set1*set2 and set**n would be funny.

Regards, Gregor

Paul Miller

unread,
Aug 16, 2003, 5:06:16 PM8/16/03
to
me...@gnosis.cx (David Mertz) wrote in message news:<mailman.1061051286...@python.org>...

> > * Is there a compelling need for additional set methods like
> > Set.powerset() and Set.isdisjoint(s) or are the current
> > offerings sufficient?
>

> As we all know, the size of the powerset of S, for len(S)==N, is 2**N.


> Seems like it would be really easy to run into some long runtimes and
> memory usage. Then again, power set really is a fundamental operation
> on sets. And making users rewrite it each time is error prone.

How is it error prone? if P is the power set operation, and S and X
are sets, then X in P(S) is equivalent to X is a subset of S. I don't
think that's all that error prone. The times when you will want to
iterate over the *entire* power set of a set are probably very few,
simply because of complexity issues. I can't see iterating over the
whole P(S) where S contains more than about 100 elements being time
feasible.

Raymond Hettinger

unread,
Aug 16, 2003, 5:50:18 PM8/16/03
to
[Raymond]

> > * Is there a compelling need for additional set methods like
> > Set.powerset() and Set.isdisjoint(s) or are the current
> > offerings sufficient?

[David Mertz]


> I confess that I have not used sets for anything beyond testing. I love
> the concept, but I just haven't had the need yet (especially in
> something where I want to require 2.3).
>
> The mention of Set.powerset() above is quite interesting to me. It
> feels both exciting and dangerous :-).


There would need to be compelling use cases before it would be
added to the API.

How about just adding a cut-and-paste recipe to the docs:

from sets import Set
def powerset(iterable):
data = list(iterable)
for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])

>>> list(powerset('abc'))
[Set([]), Set(['a']), Set(['b']), Set(['a', 'b']), Set(['c']), Set(['a', 'c']),
Set(['c', 'b']), Set(['a', 'c', 'b'])]


Raymond Hettinger


David Mertz

unread,
Aug 17, 2003, 2:55:29 AM8/17/03
to
|me...@gnosis.cx (David Mertz) wrote

|> Then again, power set really is a fundamental operation
|> on sets. And making users rewrite it each time is error prone.

pwmi...@adelphia.net (Paul Miller) wrote previously:


|How is it error prone? if P is the power set operation, and S and X
|are sets, then X in P(S) is equivalent to X is a subset of S. I don't
|think that's all that error prone.

This looks more like a definition of a method Set.ispowerset(), not one
of Set.powerset(). The actual generation of P(S) is the part I think
programmers could do wrong. I'm not saying that they might not usually
do it right... but mistakes happen.

Raymond Hettinger also wrote:
|from sets import Set
|def powerset(iterable):
| data = list(iterable)
| for i in xrange(2**len(data)):
| yield Set([e for j,e in enumerate(data) if i&(1<<j)])

Hmmm... I should have checked the docs first. A sample implementation
obviously helps. That said, this isn't REALLY an implementation of
powerset. It returns an iterator, not a set. Now, sure... an iterator
is a BETTER thing to get, for lots of reasons. But I'm not sure it
lives up to the function name.

Yours, David...

--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.

Alex Martelli

unread,
Aug 17, 2003, 8:55:06 AM8/17/03
to
David Mertz wrote:
...

> Raymond Hettinger also wrote:
> |from sets import Set
> |def powerset(iterable):
> | data = list(iterable)
> | for i in xrange(2**len(data)):
> | yield Set([e for j,e in enumerate(data) if i&(1<<j)])
>
> Hmmm... I should have checked the docs first. A sample implementation
> obviously helps. That said, this isn't REALLY an implementation of
> powerset. It returns an iterator, not a set. Now, sure... an iterator
> is a BETTER thing to get, for lots of reasons. But I'm not sure it
> lives up to the function name.

Surely writing

def therealpowerset(iterable):
return Set(powerset(iterable))

(or just inlining the Set call) isn't beyond the abilities of most
prospective users. Just like one calls list(x) on any iterable x
if one needs specifically a list, so does one call Set(x) if one
needs specifically a set. Sure, the name is debatable, and maybe
'subsetsof' would be neater. But I think this is quibbling.

IMHO, one 'real' issue with this function is that it behaves strangely
(to me) when iterable has duplicated elements -- namely, the
resulting iterator also has duplications. Changing the single
statement that is currently:
data = list(iterable)
into
data = Set(iterable)
would make duplications in 'iterable' irrelevant instead.

Alex

Raymond Hettinger

unread,
Aug 17, 2003, 12:28:38 PM8/17/03
to

"Alex Martelli" <ale...@yahoo.com> wrote in message
news:bhntv...@enews1.newsguy.com...

Good call.
Do you guys want the revised version added to the docs
or perhaps saved as an ASPN recipe?

>>> from sets import Set
>>> def powersetiterator(iterable):
... data = Set(iterable)
... for i in xrange(2**len(data)):
... yield Set([e for j,e in enumerate(data) if i&(1<<j)])

>>> Set(powersetiterator('abc'))
Set([ImmutableSet([]), ImmutableSet(['a']), ImmutableSet(['c']),
ImmutableSet(['b']), ImmutableSet(['c', 'b']), ImmutableSet(['a', 'c']),
ImmutableSet(['a', 'b']), ImmutableSet(['a', 'c', 'b'])])


Raymond Hettinger


David Mertz

unread,
Aug 17, 2003, 3:48:33 PM8/17/03
to
|> Raymond Hettinger:
|> |def powerset(iterable): ...

|David Mertz wrote:
|> Hmmm... I should have checked the docs first. A sample implementation
|> obviously helps. That said, this isn't REALLY an implementation of
|> powerset.

Alex Martelli <ale...@yahoo.com> wrote previously:
|def therealpowerset(iterable):
| return Set(powerset(iterable))

Sure. But naming the first thing 'subsetsof()' (or 'ipowerset()' to
match some of the new itertools functions) and the second just
'powerset()' seems better to me. Or maybe the second should be named:
powerset_but_use_at_your_own_risk().

However, my point above is that fewer users will lookup a sample
implementation than would use a module function. And somewhere in that
difference, some of them will do something wrong.

|IMHO, one 'real' issue with this function is that it behaves strangely
|(to me) when iterable has duplicated elements

| data = list(iterable) --> data = Set(iterable)

Yeah :-). Which points out that I should have actually READ Raymond's
function, rather than just cut-and-paste it. This seems to point out
that even the Python developers are able to make mistakes in
implementing powerset(). Which again suggests a module function
(implemented correctly) is worth having.

Yours, David...

--
_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/
_/_/ ~~~~~~~~~~~~~~~~~~~~[me...@gnosis.cx]~~~~~~~~~~~~~~~~~~~~~ _/_/
_/_/ The opinions expressed here must be those of my employer... _/_/
_/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them! _/_/


Raymond Hettinger

unread,
Aug 17, 2003, 6:39:22 PM8/17/03
to
[Alex Martelli]

> |IMHO, one 'real' issue with this function is that it behaves strangely
> |(to me) when iterable has duplicated elements
> | data = list(iterable) --> data = Set(iterable)

[David Mertz]


> Yeah :-). Which points out that I should have actually READ Raymond's
> function, rather than just cut-and-paste it. This seems to point out
> that even the Python developers are able to make mistakes in
> implementing powerset(). Which again suggests a module function
> (implemented correctly) is worth having.

It wasn't a mistake.

I do prefer Alex's improvement because it has a weaker precondition,
but the basic code and generation method was dead-on.

The real issue with adding a module function or Set method for
powerset is the lack of compelling use cases; without those, it is
simply a cute example for the docs or an ASPN recipe.

Also, if Tim were to chime-in, I think he would abstract the
problem and say that the algorithm falls in the domain of
combinatorics (for which he has written a module) and that
powersets are just a specific case of transforming a collection
of items into a collection of all possible sub-collections.


Raymond Hettinger


Tim Peters

unread,
Aug 18, 2003, 12:10:08 AM8/18/03
to
[Raymond Hettinger]
> ...

> The real issue with adding a module function or Set method for
> powerset is the lack of compelling use cases; without those, it is
> simply a cute example for the docs or an ASPN recipe.

I agree.

> Also, if Tim were to chime-in, I think he would abstract the
> problem and say that the algorithm falls in the domain of
> combinatorics (for which he has written a module) and that
> powersets are just a specific case of transforming a collection
> of items into a collection of all possible sub-collections.

Chiming.

It's more that the use cases for generating powersets demand several ways of
generating them:

+ Lexicographic (if the set elements are totally ordered).

+ Arbitrary Gray code (minimal change from one subset to the next).

+ App-specific Gray code (e.g., from one subset to the next,
add or remove that specific element that makes the largest-- or
smallest --change in the value of a function of the subset).

+ Any order at all (for apps that can't exploit a stronger
guarantee, and want peak generation speed).

one-size-doesn't-fit-anybody-ly y'rs - tim


David Mertz

unread,
Aug 18, 2003, 12:09:04 AM8/18/03
to
|[David Mertz]
|> Yeah :-). Which points out that I should have actually READ Raymond's
|> function, rather than just cut-and-paste it. This seems to point out
|> that even the Python developers are able to make mistakes in
|> implementing powerset().

|It wasn't a mistake.


|I do prefer Alex's improvement because it has a weaker precondition,
|but the basic code and generation method was dead-on.

Well... I'm not sure about dead-on. For example:

% cat time_powerset.py
from sets import Set
from time import clock

def rh_powerset(iterable):


data = list(iterable)
for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])

def am_powerset(iterable):
data = Set(iterable)


for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])

start = clock()
print Set(rh_powerset('aaaaaaaaaaaaaaaaaaa'))
print 'rh_powerset(): %.4f seconds' % (clock()-start)

start = clock()
print Set(am_powerset('aaaaaaaaaaaaaaaaaaa'))
print 'rh_powerset(): %.4f seconds' % (clock()-start)

What do you reckon gets output? How about:

% python time_powerset.py
Set([ImmutableSet([]), ImmutableSet(['a'])])
rh_powerset(): 74.9403 seconds
Set([ImmutableSet([]), ImmutableSet(['a'])])
rh_powerset(): 0.0000 seconds

So Raymond's version is AT LEAST 750,000 times slower for this
particular iterable than Alex' :-). (Alex' is just too quick to time on
my machine, I get zero when I extend the decimals in the time display).
Actually, the ratios quickly get even worse if I add a couple more
"a"s... until Raymond's raises an exception (and Alex' keeps producing
the right answer too quickly to measure.

I know Python isn't primarily focused on speed. But somewhere around
the point where one function is a million times slower than an
algorithmically equivalent one, I start to think the latter is a
smidgeon broken.

--
mertz@ _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
gnosis _/_/ Postmodern Enterprises _/_/ s r
.cx _/_/ MAKERS OF CHAOS.... _/_/ i u
_/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/ g s


Alex Martelli

unread,
Aug 18, 2003, 11:43:23 AM8/18/03
to
Raymond Hettinger wrote:
...

>> resulting iterator also has duplications. Changing the single
>> statement that is currently:
>> data = list(iterable)
>> into
>> data = Set(iterable)
>> would make duplications in 'iterable' irrelevant instead.
>
> Good call.
> Do you guys want the revised version added to the docs
> or perhaps saved as an ASPN recipe?

IMHO, the revised version would make most sense as a simple
example in the docs -- I don't think it would be equally
accessible to users if it went on the ASPN Cookbook, and it's
short enough that it doesn't seem to give problems in the docs.


Alex

0 new messages