I confess that I have not used sets for anything beyond testing. I love
the concept, but I just haven't had the need yet (especially in
something where I want to require 2.3).
The mention of Set.powerset() above is quite interesting to me. It
feels both exciting and dangerous :-).
As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
Seems like it would be really easy to run into some long runtimes and
memory usage. Then again, power set really is a fundamental operation
on sets. And making users rewrite it each time is error prone.
So I think I would advocate it, but with a fairly harsh warning in the
documentation about complexity issues.
Yours, David...
--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
.cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
X-Shameless-Plug: Buy Text Processing in Python: http://tinyurl.com/jskh
Maybe make it an iterator? <0.3 wink>
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz
>>* Is there a compelling need for additional set methods like
>> Set.powerset() and Set.isdisjoint(s) or are the current
>> offerings sufficient?
>
>
> I confess that I have not used sets for anything beyond testing. I love
> the concept, but I just haven't had the need yet (especially in
> something where I want to require 2.3).
>
> The mention of Set.powerset() above is quite interesting to me. It
> feels both exciting and dangerous :-).
I think it would be more general and versatile to provide a method for
the Cantorproduct of two sets, set1.cantorproduct(set2), which could be
used to define Set.powerset.
Additionally implementing this as __mul__ and __pow__ methods, so you
could write set1*set2 and set**n would be funny.
Regards, Gregor
> As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
> Seems like it would be really easy to run into some long runtimes and
> memory usage. Then again, power set really is a fundamental operation
> on sets. And making users rewrite it each time is error prone.
How is it error prone? if P is the power set operation, and S and X
are sets, then X in P(S) is equivalent to X is a subset of S. I don't
think that's all that error prone. The times when you will want to
iterate over the *entire* power set of a set are probably very few,
simply because of complexity issues. I can't see iterating over the
whole P(S) where S contains more than about 100 elements being time
feasible.
[David Mertz]
> I confess that I have not used sets for anything beyond testing. I love
> the concept, but I just haven't had the need yet (especially in
> something where I want to require 2.3).
>
> The mention of Set.powerset() above is quite interesting to me. It
> feels both exciting and dangerous :-).
There would need to be compelling use cases before it would be
added to the API.
How about just adding a cut-and-paste recipe to the docs:
from sets import Set
def powerset(iterable):
data = list(iterable)
for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])
>>> list(powerset('abc'))
[Set([]), Set(['a']), Set(['b']), Set(['a', 'b']), Set(['c']), Set(['a', 'c']),
Set(['c', 'b']), Set(['a', 'c', 'b'])]
Raymond Hettinger
pwmi...@adelphia.net (Paul Miller) wrote previously:
|How is it error prone? if P is the power set operation, and S and X
|are sets, then X in P(S) is equivalent to X is a subset of S. I don't
|think that's all that error prone.
This looks more like a definition of a method Set.ispowerset(), not one
of Set.powerset(). The actual generation of P(S) is the part I think
programmers could do wrong. I'm not saying that they might not usually
do it right... but mistakes happen.
Raymond Hettinger also wrote:
|from sets import Set
|def powerset(iterable):
| data = list(iterable)
| for i in xrange(2**len(data)):
| yield Set([e for j,e in enumerate(data) if i&(1<<j)])
Hmmm... I should have checked the docs first. A sample implementation
obviously helps. That said, this isn't REALLY an implementation of
powerset. It returns an iterator, not a set. Now, sure... an iterator
is a BETTER thing to get, for lots of reasons. But I'm not sure it
lives up to the function name.
Yours, David...
--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.
Surely writing
def therealpowerset(iterable):
return Set(powerset(iterable))
(or just inlining the Set call) isn't beyond the abilities of most
prospective users. Just like one calls list(x) on any iterable x
if one needs specifically a list, so does one call Set(x) if one
needs specifically a set. Sure, the name is debatable, and maybe
'subsetsof' would be neater. But I think this is quibbling.
IMHO, one 'real' issue with this function is that it behaves strangely
(to me) when iterable has duplicated elements -- namely, the
resulting iterator also has duplications. Changing the single
statement that is currently:
data = list(iterable)
into
data = Set(iterable)
would make duplications in 'iterable' irrelevant instead.
Alex
Good call.
Do you guys want the revised version added to the docs
or perhaps saved as an ASPN recipe?
>>> from sets import Set
>>> def powersetiterator(iterable):
... data = Set(iterable)
... for i in xrange(2**len(data)):
... yield Set([e for j,e in enumerate(data) if i&(1<<j)])
>>> Set(powersetiterator('abc'))
Set([ImmutableSet([]), ImmutableSet(['a']), ImmutableSet(['c']),
ImmutableSet(['b']), ImmutableSet(['c', 'b']), ImmutableSet(['a', 'c']),
ImmutableSet(['a', 'b']), ImmutableSet(['a', 'c', 'b'])])
Raymond Hettinger
|David Mertz wrote:
|> Hmmm... I should have checked the docs first. A sample implementation
|> obviously helps. That said, this isn't REALLY an implementation of
|> powerset.
Alex Martelli <ale...@yahoo.com> wrote previously:
|def therealpowerset(iterable):
| return Set(powerset(iterable))
Sure. But naming the first thing 'subsetsof()' (or 'ipowerset()' to
match some of the new itertools functions) and the second just
'powerset()' seems better to me. Or maybe the second should be named:
powerset_but_use_at_your_own_risk().
However, my point above is that fewer users will lookup a sample
implementation than would use a module function. And somewhere in that
difference, some of them will do something wrong.
|IMHO, one 'real' issue with this function is that it behaves strangely
|(to me) when iterable has duplicated elements
| data = list(iterable) --> data = Set(iterable)
Yeah :-). Which points out that I should have actually READ Raymond's
function, rather than just cut-and-paste it. This seems to point out
that even the Python developers are able to make mistakes in
implementing powerset(). Which again suggests a module function
(implemented correctly) is worth having.
Yours, David...
--
_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: Postmodern Enterprises _/_/_/
_/_/ ~~~~~~~~~~~~~~~~~~~~[me...@gnosis.cx]~~~~~~~~~~~~~~~~~~~~~ _/_/
_/_/ The opinions expressed here must be those of my employer... _/_/
_/_/_/_/_/_/_/_/_/_/ Surely you don't think that *I* believe them! _/_/
[David Mertz]
> Yeah :-). Which points out that I should have actually READ Raymond's
> function, rather than just cut-and-paste it. This seems to point out
> that even the Python developers are able to make mistakes in
> implementing powerset(). Which again suggests a module function
> (implemented correctly) is worth having.
It wasn't a mistake.
I do prefer Alex's improvement because it has a weaker precondition,
but the basic code and generation method was dead-on.
The real issue with adding a module function or Set method for
powerset is the lack of compelling use cases; without those, it is
simply a cute example for the docs or an ASPN recipe.
Also, if Tim were to chime-in, I think he would abstract the
problem and say that the algorithm falls in the domain of
combinatorics (for which he has written a module) and that
powersets are just a specific case of transforming a collection
of items into a collection of all possible sub-collections.
Raymond Hettinger
I agree.
> Also, if Tim were to chime-in, I think he would abstract the
> problem and say that the algorithm falls in the domain of
> combinatorics (for which he has written a module) and that
> powersets are just a specific case of transforming a collection
> of items into a collection of all possible sub-collections.
Chiming.
It's more that the use cases for generating powersets demand several ways of
generating them:
+ Lexicographic (if the set elements are totally ordered).
+ Arbitrary Gray code (minimal change from one subset to the next).
+ App-specific Gray code (e.g., from one subset to the next,
add or remove that specific element that makes the largest-- or
smallest --change in the value of a function of the subset).
+ Any order at all (for apps that can't exploit a stronger
guarantee, and want peak generation speed).
one-size-doesn't-fit-anybody-ly y'rs - tim
|It wasn't a mistake.
|I do prefer Alex's improvement because it has a weaker precondition,
|but the basic code and generation method was dead-on.
Well... I'm not sure about dead-on. For example:
% cat time_powerset.py
from sets import Set
from time import clock
def rh_powerset(iterable):
data = list(iterable)
for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])
def am_powerset(iterable):
data = Set(iterable)
for i in xrange(2**len(data)):
yield Set([e for j,e in enumerate(data) if i&(1<<j)])
start = clock()
print Set(rh_powerset('aaaaaaaaaaaaaaaaaaa'))
print 'rh_powerset(): %.4f seconds' % (clock()-start)
start = clock()
print Set(am_powerset('aaaaaaaaaaaaaaaaaaa'))
print 'rh_powerset(): %.4f seconds' % (clock()-start)
What do you reckon gets output? How about:
% python time_powerset.py
Set([ImmutableSet([]), ImmutableSet(['a'])])
rh_powerset(): 74.9403 seconds
Set([ImmutableSet([]), ImmutableSet(['a'])])
rh_powerset(): 0.0000 seconds
So Raymond's version is AT LEAST 750,000 times slower for this
particular iterable than Alex' :-). (Alex' is just too quick to time on
my machine, I get zero when I extend the decimals in the time display).
Actually, the ratios quickly get even worse if I add a couple more
"a"s... until Raymond's raises an exception (and Alex' keeps producing
the right answer too quickly to measure.
I know Python isn't primarily focused on speed. But somewhere around
the point where one function is a million times slower than an
algorithmically equivalent one, I start to think the latter is a
smidgeon broken.
--
mertz@ _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
gnosis _/_/ Postmodern Enterprises _/_/ s r
.cx _/_/ MAKERS OF CHAOS.... _/_/ i u
_/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/ g s
IMHO, the revised version would make most sense as a simple
example in the docs -- I don't think it would be equally
accessible to users if it went on the ASPN Cookbook, and it's
short enough that it doesn't seem to give problems in the docs.
Alex