Extremely weird itertools.permutations

349 views
Skip to first unread message

Neil Girdhar

unread,
Oct 11, 2013, 2:38:33 PM10/11/13
to python...@googlegroups.com
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." — http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original.


Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag?

Best,
Neil

Neil Girdhar

unread,
Oct 11, 2013, 2:50:00 PM10/11/13
to python...@googlegroups.com
Note that if permutations is made to return only unique permutations, the behaviour of defining unique elements by index can be recovered using:

([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it))))

Serhiy Storchaka

unread,
Oct 11, 2013, 3:29:35 PM10/11/13
to python...@python.org
11.10.13 21:38, Neil Girdhar написав(ла):

> Should we consider fixing itertools.permutations and to output only
> unique permutations (if possible, although I realize that would break
> code). It is completely non-obvious to have permutations returning
> duplicates. For a non-breaking compromise what about adding a flag?

I think this should be separated function.


_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas

David Mertz

unread,
Oct 11, 2013, 4:02:11 PM10/11/13
to python-ideas
What would you like this hypothetical function to output here:

>>> from itertools import permutations
>>> from decimal import Decimal as D
>>> from fractions import Fraction as F
>>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>> list(permutations(items))

It's neither QUITE equality nor identity you are looking for, I think, in nonredundant_permutation():

>> "aa" == "AA".lower(), "aa" is "AA".lower()
(True, False)
>>> "aa" == "a"+"a", "aa" is "a"+"a"
(True, True)
>>> D(3) == 3.0, D(3) is 3.0
(True, False)

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas




--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.



--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

Neil Girdhar

unread,
Oct 11, 2013, 4:15:27 PM10/11/13
to David Mertz, python...@googlegroups.com
Can you explain what's wrong with equality?  Just like dicts, sets, and so many other Python internals, equality usually does the right thing, and where it doesn't you can override __eq__ to make it so.

The reality is that the way permutations is implemented makes it hard to solve simple problems involving permutations.  There is no good reason for these simple problems to be hard, and so it should be fixed in the standard library.

Best,

Neil


On Fri, Oct 11, 2013 at 3:57 PM, David Mertz <me...@gnosis.cx> wrote:
What would you like this hypothetical function to output here:

>>> from itertools import permutations
>>> from decimal import Decimal as D
>>> from fractions import Fraction as F
>>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>> list(permutations(items))

It's neither QUITE equality nor identity you are looking for, I think, in nonredundant_permutation():

>> "aa" == "AA".lower(), "aa" is "AA".lower()
(True, False)
>>> "aa" == "a"+"a", "aa" is "a"+"a"
(True, True)
>>> D(3) == 3.0, D(3) is 3.0
(True, False)


On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <miste...@gmail.com> wrote:
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas

Andrew Barnert

unread,
Oct 11, 2013, 4:19:22 PM10/11/13
to David Mertz, python-ideas
I think equality is perfectly reasonable here. The fact that {3.0, 3} only has one member seems like the obvious precedent to follow here.

Sent from a random iPhone

Jonathan Brandvein

unread,
Oct 11, 2013, 5:19:38 PM10/11/13
to Andrew Barnert, python-ideas
I think it's fair to use {3.0, 3} as precedent. But note that transitivity is not required by the __eq__() method. In cases of intransitive equality (A == B == C but not A == C), I imagine the result should be ill-defined in the same way that sorting is when the key function is inconsistent.

Jon

David Mertz

unread,
Oct 11, 2013, 4:27:34 PM10/11/13
to Andrew Barnert, python-ideas
Andrew & Neil (or whoever):

Is this *really* what you want:

>>> from itertools import permutations
>>> def nonredundant_permutations(seq):
...     return list(set(permutations(seq)))
...
>>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
[(Fraction(3, 1), Decimal('3'), 3.0),
 (Fraction(3, 1), 3.0, Decimal('3')),
 (Decimal('3'), Fraction(3, 1), 3.0),
 (Decimal('3'), 3.0, Fraction(3, 1)),
 (3.0, Fraction(3, 1), Decimal('3')),
 (3.0, Decimal('3'), Fraction(3, 1))]

>>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
[(Fraction(3, 1), Decimal('3'), 3.0)]

It seems odd to me to want that.  On the other hand, I provide a one-line implementation of the desired behavior if anyone wants it.  Moreover, I don't think the runtime behavior of my one-liner is particularly costly... maybe not the best possible, but the best big-O possible.

MRAB

unread,
Oct 11, 2013, 5:25:56 PM10/11/13
to python-ideas
On 11/10/2013 20:29, Serhiy Storchaka wrote:
> 11.10.13 21:38, Neil Girdhar написав(ла):
>> Should we consider fixing itertools.permutations and to output only
>> unique permutations (if possible, although I realize that would break
>> code). It is completely non-obvious to have permutations returning
>> duplicates. For a non-breaking compromise what about adding a flag?
>
> I think this should be separated function.
>
+1

Neil Girdhar

unread,
Oct 11, 2013, 5:35:41 PM10/11/13
to python...@googlegroups.com, Andrew Barnert, python-ideas
> Moreover, I don't think the runtime behavior of my one-liner is particularly costly…

It is *extremely* costly.  There can be n! permutations, so for even, say, 12 elements, you are looking at many gigabytes of memory needlessly used.  One big motivator for itertools is not to have to do this.  I'm curious how you would solve this problem: https://www.kattis.com/problems/industrialspy  efficiently in Python.  I did it by using a unique-ifying generator, but ideally this would not be necessary.  Ideally, Python would do exactly what C++ does with next_permutation.

Best,

Neil


---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Neil Girdhar

unread,
Oct 11, 2013, 5:38:27 PM10/11/13
to python...@googlegroups.com, python-ideas
My code, which was the motivation for this suggestion:

import itertools as it
import math

def is_prime(n):
    for i in range(2, int(math.floor(math.sqrt(n))) + 1):
        if n % i == 0:
            return False
    return n >= 2

def unique(iterable):    # Should not be necessary in my opinion
    seen = set()
    for x in iterable:
        if x not in seen:
            seen.add(x)
            yield x

n = int(input())
for _ in range(n):
    x = input()
    print(sum(is_prime(int("".join(y)))
              for len_ in range(1, len(x) + 1)
              for y in unique(it.permutations(x, len_))
              if y[0] != '0'))

MRAB

unread,
Oct 11, 2013, 5:38:41 PM10/11/13
to python-ideas
On 11/10/2013 21:27, David Mertz wrote:
> Andrew & Neil (or whoever):
>
> Is this *really* what you want:
>
> >>> from itertools import permutations
> >>> def nonredundant_permutations(seq):
> ... return list(set(permutations(seq)))
> ...
> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0),
> (Fraction(3, 1), 3.0, Decimal('3')),
> (Decimal('3'), Fraction(3, 1), 3.0),
> (Decimal('3'), 3.0, Fraction(3, 1)),
> (3.0, Fraction(3, 1), Decimal('3')),
> (3.0, Decimal('3'), Fraction(3, 1))]
>
> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0)]
>
> It seems odd to me to want that. On the other hand, I provide a
> one-line implementation of the desired behavior if anyone wants it.
> Moreover, I don't think the runtime behavior of my one-liner is
> particularly costly... maybe not the best possible, but the best big-O
> possible.
>
n! gets very big very fast, so that can be a very big set.

If you sort the original items first then it's much easier to yield
unique permutations without having to remember them. (Each would be >
than the previous one, although you might have to map them to orderable
keys if they're not orderable themselves, e.g. a mixture of integers
and strings.)
>
>
> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abar...@yahoo.com
> <mailto:abar...@yahoo.com>> wrote:
>
> I think equality is perfectly reasonable here. The fact that {3.0,
> 3} only has one member seems like the obvious precedent to follow here.
>
> Sent from a random iPhone
>
> On Oct 11, 2013, at 13:02, David Mertz <me...@gnosis.cx

David Mertz

unread,
Oct 11, 2013, 5:48:25 PM10/11/13
to Neil Girdhar, python-ideas, python...@googlegroups.com
OK, you're right.  Just using set() has bad worst case memory costs.  I was thinking of the case where there actually WERE lots of equalities, and hence the resulting list would be much smaller than N!.  But of course that's not general.  It takes more than one line, but here's an incremental version:

def nonredundant_permutations(seq):
    seq = sorted(seq)
    last = None
    for perm in permutations(seq):
        if perm != last:
            yield perm
            last = perm

Neil Girdhar

unread,
Oct 11, 2013, 5:51:06 PM10/11/13
to David Mertz, python-ideas, python...@googlegroups.com
Unfortunately, that doesn't quite work…

list("".join(x) for x in it.permutations('aaabb', 3))
['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

David Mertz

unread,
Oct 11, 2013, 6:03:48 PM10/11/13
to Neil Girdhar, python-ideas
Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting, and falsely thought that would be general; but obviously it's not.

So I guess the question is whether there is ANY way to do this without having to accumulate a 'seen' set (which can grow to size N!).  The answer isn't jumping out at me, but that doesn't mean there's not a way.

I don't want itertools.permutations() to do "equality filtering", but assuming some other function in itertools were to do that, how could it do so algorithmically? Or whatever, same question if it is itertools.permutations(seq, distinct=True) as the API.

MRAB

unread,
Oct 11, 2013, 6:19:34 PM10/11/13
to python-ideas
On 11/10/2013 23:03, David Mertz wrote:
> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting,
> and falsely thought that would be general; but obviously it's not.
>
> So I guess the question is whether there is ANY way to do this without
> having to accumulate a 'seen' set (which can grow to size N!). The
> answer isn't jumping out at me, but that doesn't mean there's not a way.
>
> I don't want itertools.permutations() to do "equality filtering", but
> assuming some other function in itertools were to do that, how could it
> do so algorithmically? Or whatever, same question if it is
> itertools.permutations(seq, distinct=True) as the API.
>
Here's an implementation:

def unique_permutations(iterable, count=None, key=None):
def perm(items, count):
if count:
prev_item = object()

for i, item in enumerate(items):
if item != prev_item:
for p in perm(items[ : i] + items[i + 1 : ], count
- 1):
yield [item] + p

prev_item = item

else:
yield []

if key is None:
key = lambda item: item

items = sorted(iterable, key=key)

if count is None:
count = len(items)

yield from perm(items, count)


And some results:

>>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>>> print(list(unique_permutations([0, 'a', 0], key=str)))
[[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]

Neil Girdhar

unread,
Oct 11, 2013, 6:23:36 PM10/11/13
to python...@googlegroups.com, python-ideas
Beautiful!!


--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com.

David Mertz

unread,
Oct 11, 2013, 6:45:04 PM10/11/13
to Neil Girdhar, python-ideas

I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-values that my version was ALMOST right:

def nonredundant_permutations(seq, r=None):
    last = ()
    for perm in permutations(sorted(seq), r):
        if perm > last:
            yield perm
            last = perm

I can't look only for inequality, but must use the actual comparison.

>>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
[(Fraction(3, 1), Decimal('3'), 3.0)]

Of course, this approach DOES rely on the order in which itertools.permutations() returns values.  However, it's a bit more compact than MRAB's version.


Nick Coghlan

unread,
Oct 11, 2013, 7:49:55 PM10/11/13
to David Mertz, Neil Girdhar, python...@python.org


On 12 Oct 2013 08:45, "David Mertz" <me...@gnosis.cx> wrote:
>
>
> I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-values that my version was ALMOST right:
>
> def nonredundant_permutations(seq, r=None):
>     last = ()
>     for perm in permutations(sorted(seq), r):
>         if perm > last:
>             yield perm
>             last = perm
>
> I can't look only for inequality, but must use the actual comparison.
>
> >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
> [(Fraction(3, 1), Decimal('3'), 3.0)]
>
> Of course, this approach DOES rely on the order in which itertools.permutations() returns values.  However, it's a bit more compact than MRAB's version.

As there is no requirement that entries in a sequence handled by itertools.permutations be sortable, so the original question of why this isn't done by default has been answered (the general solution risks consuming too much memory, while the memory efficient solution constrains the domain to only sortable sequences).

Cheers,
Nick.

>>> To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.


>>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python...@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>

Neil Girdhar

unread,
Oct 11, 2013, 7:53:31 PM10/11/13
to Nick Coghlan, python-ideas
Yes, that's all true.  I want to suggest that the efficient unique permutations solution is very important to have.  Sortable sequences are very common. There are itertools routines that only work with =-comparable elements (e.g. groupby), so it's not a stretch to have a permutations that is restricted to <-comparable elements.

Best,
Neil

Andrew Barnert

unread,
Oct 11, 2013, 8:20:17 PM10/11/13
to Neil Girdhar, python-ideas
I think this is worth having even for 3.3 and 2.x, so I'd suggest sending a patch to more-itertools (https://github.com/erikrose/more-itertools) as well as here.


Sent from a random iPhone

MRAB

unread,
Oct 11, 2013, 9:55:23 PM10/11/13
to python-ideas
On 12/10/2013 00:49, Nick Coghlan wrote:
>
> On 12 Oct 2013 08:45, "David Mertz" <me...@gnosis.cx
> <mailto:me...@gnosis.cx>> wrote:
> >
> >
> > I realize after reading
> http://stackoverflow.com/questions/6284396/permutations-with-unique-values
> that my version was ALMOST right:
> >
> > def nonredundant_permutations(seq, r=None):
> > last = ()
> > for perm in permutations(sorted(seq), r):
> > if perm > last:
> > yield perm
> > last = perm
> >
> > I can't look only for inequality, but must use the actual comparison.
> >
> > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
> > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
> > [(Fraction(3, 1), Decimal('3'), 3.0)]
> >
> > Of course, this approach DOES rely on the order in which
> itertools.permutations() returns values. However, it's a bit more
> compact than MRAB's version.
>
> As there is no requirement that entries in a sequence handled by
> itertools.permutations be sortable, so the original question of why this
> isn't done by default has been answered (the general solution risks
> consuming too much memory, while the memory efficient solution
> constrains the domain to only sortable sequences).
>
OK, here's a new implementation:

def unique_permutations(iterable, count=None):
def perm(items, count):
if count:
prev_item = object()

for i, item in enumerate(items):
if item != prev_item:
for p in perm(items[ : i] + items[i + 1 : ], count
- 1):
yield [item] + p

prev_item = item

else:
yield []

items = list(iterable)

keys = {}

for item in items:
keys.setdefault(item, len(keys))

items.sort(key=keys.get)

if count is None:
count = len(items)

yield from perm(items, count)

Steven D'Aprano

unread,
Oct 11, 2013, 10:06:48 PM10/11/13
to python...@python.org
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> "It is universally agreed that a list of n distinct symbols has n!
> permutations. However, when the symbols are not distinct, the most common
> convention, in mathematics and elsewhere, seems to be to count only
> distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical)
example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select
without replacement. If you count only unique permutations, you get only
four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
{'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which
is incorrect. The right way to model this experiment is not to count
distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB',
'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to
six ways of drawing RR. If that's not obvious enough, consider the case
where you have two thousand red balls and two black balls -- do you
really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most
useful or common convention. If you're permuting a collection of
non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would
only want distinct permutations, and I can't.


> Should we consider fixing itertools.permutations and to output only unique
> permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique
permutations, and I strongly disagree that you were, the fact that it
would break code is a deal-breaker.

--
Steven

MRAB

unread,
Oct 11, 2013, 10:34:33 PM10/11/13
to python-ideas
[snip]
I've just realised that I don't need to sort them at all.

Here's a new improved implementation:

def unique_permutations(iterable, count=None):
def perm(items, count):
if count:
seen = set()

for i, item in enumerate(items):
if item not in seen:
for p in perm(items[ : i] + items[i + 1 : ], count
- 1):
yield [item] + p

seen.add(item)
else:
yield []

items = list(iterable)

David Mertz

unread,
Oct 11, 2013, 10:36:26 PM10/11/13
to python-ideas
Hi MRAB,

I'm confused by your implementation.  In particular, what do these lines do?

    # [...]
    items = list(iterable)
    keys = {}
    for item in items:
        keys.setdefault(item, len(keys))
    items.sort(key=keys.get)

I cannot understand how these can possibly have any effect (other than the first line that makes a concrete list out of an iterable).

We loop through the list in its natural order.  E.g. say the list is '[a, b, c]' (where those names are any types of objects whatsoever).  The loop gives us:

    keys == {a:0, b:1, c:2}

When we do a sort on 'key=keys.get()' how can that ever possibly change the order of 'items'?

There's also a bit of a flaw in that your implementation blows up if anything yielded by iterable isn't hashable:

    >>> list(unique_permutations([ [1,2],[3,4],[5,6] ]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'

There's no problem doing this with itertools.permutations:

    >>> list(permutations([[1,2],[3,4],[5,6]]))
    [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2], [5, 6]), 
    ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4], [1, 2])]

This particular one also succeeds with my nonredundant_permutations:

    >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]]))
    [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2], [5, 6]), 
    ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4], [1, 2])]

However, my version *DOES* fail when things cannot be compared under inequality:

    >>> list(nonredundant_permutations([[1,2],3,4]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in nonredundant_permutations
    TypeError: unorderable types: int() < list()

This also doesn't afflict itertools.permutations.

Neil Girdhar

unread,
Oct 11, 2013, 10:37:02 PM10/11/13
to python...@googlegroups.com, python-ideas
I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that).  I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient.  A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

I believe most people will be surprised that len(permutations(iterable)) does count unique permutations.

Best,

Neil


David Mertz

unread,
Oct 11, 2013, 10:48:23 PM10/11/13
to Neil Girdhar, python-ideas
Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items.

This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods.  When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other.  On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

Neil Girdhar

unread,
Oct 11, 2013, 10:55:06 PM10/11/13
to David Mertz, python-ideas
I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here.  When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense.  The fact that dict and set use equality is I think the reason that permutations should use equality.

Neil

Andrew Barnert

unread,
Oct 11, 2013, 10:57:08 PM10/11/13
to David Mertz, Neil Girdhar, python-ideas
On Oct 11, 2013, at 19:48, David Mertz <me...@gnosis.cx> wrote:

> My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

Nick Coghlan

unread,
Oct 12, 2013, 12:35:13 AM10/12/13
to Neil Girdhar, python...@python.org


On 12 Oct 2013 12:56, "Neil Girdhar" <miste...@gmail.com> wrote:
>
> I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here.  When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense.  The fact that dict and set use equality is I think the reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of itertools?

Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results.

I *believe* the request is equivalent to giving a name to the following genexp:

    (k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to).

Cheers,
Nick.

Stephen J. Turnbull

unread,
Oct 12, 2013, 1:10:21 AM10/12/13
to Neil Girdhar, python-ideas
Neil Girdhar writes:

> I honestly think that Python should stick to the mathematical
> definition of permutations rather than some kind of consensus of
> the tiny minority of people here.

Is there an agreed mathematical definition of permutations of
*sequences*? Every definition I can find refers to permutations of
*sets*. I think any categorist would agree that there are a large
number of maps of _Sequence_ to _Set_, in particular the two obviously
useful ones[1]: the one that takes each element of the sequence to a
*different* element of the corresponding set, and the one that takes
equal elements of the sequence to the *same* element of the
corresponding set. The corresponding set need not be the underlying
set of the sequence, and which one is appropriate presumably depends
on applications.

> When next_permutation was added to C++, I believe the whole
> standards committee discussed it and they came up with the thing
> that makes the most sense.

To the negligible (in several senses of the word) fraction of humanity
that participates in C++ standardization. Python is not C++ (thanking
all the Roman and Greek gods, and refusing to identify Zeus with
Jupiter, nor Aphrodite with Venus<wink/>).

> The fact that dict and set use equality is I think the reason that
> permutations should use equality.

Sequences are not sets, and dict is precisely the wrong example for
you to use, since it makes exactly the point that values that are
identical may be bound to several different keys. We don't unify keys
in a dict just because the values are identical (or equal). Similar,
in representing a sequence as a set, we use a set of ordered pairs,
with the first component an unique integer indicating position, and
the second the sequence element.

Since there are several useful mathematical ways to convert sequences
to sets, and in particular one very similar, if not identical, to the
one you like is enshrined in the very convenient constructor set(), I
think it's useful to leave it as it is.

> It is universally agreed that a list of n distinct symbols has n!
> permutations.

But that's because there's really no sensible definition of
"underlying set" for such a list except the set containing exactly the
same elements as the list.[2] But there is no universal agreement
that "permutations of a list" is a sensible phrase.

For example, although the Wikipedia article Permutation refers to
lists of permutations, linked list representations of data, to the
"list of objects" for use in Cauchy's notation, and to the cycle
representation as a list of sequences, it doesn't once refer to
permutation of a list.

They're obvious not averse to discussing lists, but the word use for
the entity being permuted is invariably "set".


Footnotes:
[1] And some maps not terribly useful for our purposes, such as one
that maps all sequences to a singleton.

[2] A categorist would disagree, but that's not interesting.

David Mertz

unread,
Oct 12, 2013, 1:26:07 AM10/12/13
to Nick Coghlan, Neil Girdhar, python-ideas
What you propose, Nick, is definitely different from the several functions that have been bandied about here.  I.e.

>>> def nick_permutations(items, r=None):
...     return (k for k, grp in groupby(permutations(sorted(items),r)))

>>> ["".join(p) for p in nonredundant_permutations('aaabb', 3)]
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in nick_permutations('aaabb', 3)]
['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in permutations('aaabb', 3)]
['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

If I'm thinking of this right, what you give is equivalent to the initial flawed version of 'nonredundant_permutations()' that I suggested, which used '!=' rather than the correct '>' in comparing to the 'last' tuple.

FWIW, I deliberately chose the name 'nonredundant_permutations' rather than MRAB's choice of 'unique_permutations' because I think what the filtering does is precisely NOT to give unique ones. Or rather, not to give ALL unique ones, but only those defined by equivalence (i.e. rather than identity).  

My name is ugly, and if there were to be a function like it in itertools, a better name should be found. But such a name should emphasize that it is "filter by equivalence classes" ... actually, maybe this suggests another function which is instead "filter by identity of tuples", potentially also added to itertools.

David Mertz

unread,
Oct 12, 2013, 1:38:19 AM10/12/13
to Andrew Barnert, Neil Girdhar, python-ideas
Hi Andrew,

I've sort of said as much in my last reply to Nick.  But maybe I can clarify further.  I can imagine *someone* wanting a filtering of permutations by either identify or equality.  Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

  equal_perms = distinct_permutations(items, r, filter_by=operator.eq)
  ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

  start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.



On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abar...@yahoo.com> wrote:
On Oct 11, 2013, at 19:48, David Mertz <me...@gnosis.cx> wrote:

> My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.



David Mertz

unread,
Oct 12, 2013, 1:48:25 AM10/12/13
to Andrew Barnert, Neil Girdhar, python-ideas
Btw. My implementation of nonredundant_permutations *IS* guaranteed to work by the docs for Python 3.4.  Actually for Python 2.7+.  That is, it's not just an implementation accident (as I thought before I checked), but a promised API of itertools.permutations that:

  Permutations are emitted in lexicographic sort order. So, if the
  input iterable is sorted, the permutation tuples will be produced in
  sorted order.

As long as that holds, my function will indeed behave correctly (but of course, with the limitation that it blows up if different items in the argument iterable cannot be compared using operator.lt().

Steven D'Aprano

unread,
Oct 12, 2013, 2:34:46 AM10/12/13
to python...@python.org
On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:
> I honestly think that Python should stick to the mathematical definition of
> permutations rather than some kind of consensus of the tiny minority of
> people here.

So do I. And that is exactly what itertools.permutations already does.

Neil Girdhar

unread,
Oct 12, 2013, 2:55:25 AM10/12/13
to Nick Coghlan, python-ideas
Hi Nick,

Rereading my messages, I feel like I haven't been as diplomatic as I wanted.  Like everyone here, I care a lot about Python and I want to see it become as perfect as it can be made.  If my wording has been too strong, it's only out of passion for Python.

I acknowledged in my initial request that it would be impossible to change the default behaviour of itertools.permutations.  I understand that that ship has sailed.  I think my best proposal is to have an efficient distinct_permutations function in itertools.  It should be in itertools so that it is discoverable.  It should be a function rather one of the recipes proposed to make it as efficient as possible.  (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)

I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.

Best,

Neil

Neil Girdhar

unread,
Oct 12, 2013, 3:02:47 AM10/12/13
to python-ideas, David Mertz
Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".

David Mertz

unread,
Oct 12, 2013, 3:09:32 AM10/12/13
to Neil Girdhar, python-ideas
On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <miste...@gmail.com> wrote:
Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".

Yes, 'key' is a much better name than what I suggested. 

I'm not quite sure how best to implement this still.  I guess MRAB's recursive approach should work, even though I like the simplicity of my style that takes full advantage of the existing itertools.permutations() (and uses 1/3 as many lines of--I think clearer--code).  His has the advantage, however, that it doesn't require operator.lt() to work... however, without benchmarking, I have a pretty strong feeling that my suggestion will be faster since it avoids all that recursive call overhead.  Maybe I'm wrong about that though.

Neil Girdhar

unread,
Oct 12, 2013, 3:17:43 AM10/12/13
to python...@googlegroups.com, python-ideas
I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition.  Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1   Please look in the chapter "Permutation of multisets".

Best,

Neil


Steven D'Aprano

unread,
Oct 12, 2013, 3:35:31 AM10/12/13
to python...@python.org
On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:
> I think it's pretty indisputable that permutations are formally defined
> this way (and I challenge you to find a source that doesn't agree with
> that).

If by "this way" you mean "unique permutations only", then yes it
*completely* disputable, and I am doing so right now. I'm not arguing
one way or the other for a separate "unique_permutations" generator,
just that the existing permutations generator does the right thing. If
you're satisfied with that answer, you can stop reading now, because the
rest of my post is going to be rather long:

TL;DR:

If you want a unique_permutations generator, that's a reasonable
request. If you insist on changing permutations, that's unreasonable,
firstly because the current behaviour is correct, and secondly because
backwards compatibility would constrain it to keep the existing
behaviour even if it were wrong.

.
.
.

Still here? Okay then, let me justify why I say the current behaviour is
correct.

Speaking as a math tutor who has taught High School level combinatorics
for 20+ years, I've never come across any text book or source that
defines permutations in terms of unique permutations only. In every case
that I can remember, or that I still have access to, unique permutations
is considered a different kind of operation ("permutations ignoring
duplicates", if you like) rather than the default. E.g. "Modern
Mathematics 6" by Fitzpatrick and Galbraith has a separate section for
permutations with repetition, gives the example of taking permutations
from the word "MAMMAL", and explicitly contrasts situations where you
consider the three letters M as "different" from when you consider them
"the same". But in all such cases, such a situation is discussed as a
restriction on permutations, not an expansion, that is:

* there are permutations;
* sometimes you want to only consider unique permutations;

rather than:

* there are permutations, which are always unique;
* sometimes you want to consider things which are like permutations
except they're not necessarily unique.


I'd even turn this around and challenge you to find a source that *does*
define them as always unique. Here's a typical example, from the Collins
Dictionary of Mathematics:


[quote]
**permutation** or **ordered arrangement** n. 1 an ordered arrangement
of a specified number of objects selected from a set. The number of
distinct permutations of r objects from n is

n!/(n-r)!

usually written <subscript>n P <subscript>r or <superscript>n P
<subscript>r. For example there are six distinct permutations of two
objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>.
Compare COMBINATION.

2. any rearrangement of all the elements of a finite sequence, such as
(1,3,2) and (3,1,2). It is *odd* or *even* according as the number of
exchanges of position yielding it from the original order is odd or
even. It is a *cyclic permutation* if it merely advances all the
elements a fixed number of places; that is, if it is a CYCLE of maximal
LENGTH. A *transposition* is a cycle of degree two, and all permutations
factor as products of transpositions. See also SIGNATURE.

3. any BIJECTION of a set to itself, where the set may be finite or
infinite.
[end quote]



The definition makes no comment about how to handle duplicate elements,
but we can derive an answer for that:

1) We're told how many permutations there are. Picking r elements out of
n gives us n!/(n-r)!. If you throw away duplicate permutations, you will
fall short.

2) The number of permutations shouldn't depend on the specific
entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D)
should be identical. If your set of elements contains duplicates, such
as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put
the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute
the integers, then reverse the mapping to get balls again. If we do
this, we ought to get the same result as just permuting the balls
directly.

(That's not to say that there are never cases where we don't care to
distinguish betweem one red ball and another. But in general we do
distinguish between them.)

I think this argument may hinge on what you consider *distinct*. In this
context, if I permute the string "RRRBB", I consider all three
characters to be distinct. Object identity is an implementation detail
(not all programming languages have "objects"); even equality is an
irrelevant detail. If I'm choosing to permute "RRRBB" rather than
"RB", then clearly *to me* there must be some distinguishing factor
between the three Rs and two Bs.

Another source is Wolfram Mathworld:

http://mathworld.wolfram.com/Permutation.html

which likewise says nothing about discarding repeated permutations when
there are repeated elements. See also their page on "Ball Picking":

http://mathworld.wolfram.com/BallPicking.html

Last but not least, here's a source which clearly distinguishes
permutations from "permutations with duplicates":

http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html

and even gives a distinct formula for calculating the number of
permutations. Neither Wolfram Mathworld nor the Collins Dictionary of
Maths consider this formula important enough to mention, which suggests
strongly that it should be considered separate from the default
permutations.

(A little like cyclic permutations, which are different again.)

Steven D'Aprano

unread,
Oct 12, 2013, 3:39:30 AM10/12/13
to python...@python.org
On Sat, Oct 12, 2013 at 06:35:31PM +1100, Steven D'Aprano wrote:

> I think this argument may hinge on what you consider *distinct*. In this
> context, if I permute the string "RRRBB", I consider all three
> characters to be distinct.

/s/three/five/

Neil Girdhar

unread,
Oct 12, 2013, 4:16:33 AM10/12/13
to python...@googlegroups.com
I just posted a reference that has a whole chapter on permutations of multisets.  Doing a search in Google scholar yields many other references:

etc.

In his index to Art of Computer Programming II, Knuth has "Permutation: Ordered arrangement of a multiset" although I can't seem to find the corresponding material in the text.

Your only reference which talks about duplicates makes it sound to me like "that's the *only* way you take permutations when there are duplicates" and that the algorithm for finding permutations when there aren't duplicates no longer applies in this case. 

The other references don't talk about duplicate elements.  (Ball picking doesn't; without replacement, the n balls are unique.)


TB

unread,
Oct 12, 2013, 4:18:35 AM10/12/13
to Steven D'Aprano, python...@python.org
On 10/12/2013 10:35 AM, Steven D'Aprano wrote:
> If you want a unique_permutations generator, that's a reasonable
> request. If you insist on changing permutations, that's unreasonable,
> firstly because the current behaviour is correct, and secondly because
> backwards compatibility would constrain it to keep the existing
> behaviour even if it were wrong.
>
I agree that backwards compatibility should be kept, but the current
behaviour of itertools.permutations is (IMHO) surprising.

So here are my 2c: Until I tried it myself, I was sure that it will be
like the corresponding permutations functions in Sage:

sage: list(Permutations("aba"))
[['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']]

or Mathematica:
http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a}

Currently the docstring of itertools.permutations just says "Return
successive r-length permutations of elements in the iterable", without
telling what happens with input of repeated elements. The full doc in
the reference manual is better in that regard, but I think at least one
example with repeated elements would be nice.

Regards,
TB

Neil Girdhar

unread,
Oct 12, 2013, 4:19:47 AM10/12/13
to python...@googlegroups.com
A very interesting paper:  An O(1) Time Algorithm for Generating Multiset Permutations. I don't have access to it, but I wonder whether they generate unique permutations or not…

Neil Girdhar

unread,
Oct 12, 2013, 4:20:24 AM10/12/13
to python...@googlegroups.com, python-ideas
+1


--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com.

Andrew Barnert

unread,
Oct 12, 2013, 4:22:59 AM10/12/13
to Neil Girdhar, python-ideas
On Oct 11, 2013, at 23:55, Neil Girdhar <miste...@gmail.com> wrote:

> I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)
>
> I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.

I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day.

I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind.

Mark Lawrence

unread,
Oct 12, 2013, 4:28:55 AM10/12/13
to python...@python.org
On 12/10/2013 09:18, TB wrote:
> Currently the docstring of itertools.permutations just says "Return
> successive r-length permutations of elements in the iterable", without
> telling what happens with input of repeated elements. The full doc in
> the reference manual is better in that regard, but I think at least one
> example with repeated elements would be nice.
>
> Regards,
> TB

I look forward to seeing your suggested doc patch on the Python bug tracker.

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

Terry Reedy

unread,
Oct 12, 2013, 4:41:48 AM10/12/13
to python...@python.org
On 10/12/2013 3:35 AM, Steven D'Aprano wrote:

> I'd even turn this around and challenge you to find a source that *does*
> define them as always unique. Here's a typical example, from the Collins
> Dictionary of Mathematics:
>
>
> [quote]
> **permutation** or **ordered arrangement** n. 1 an ordered arrangement
> of a specified number of objects selected from a set. The number of
> distinct permutations of r objects from n is
>
> n!/(n-r)!
>
> usually written <subscript>n P <subscript>r or <superscript>n P
> <subscript>r. For example there are six distinct permutations of two
> objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>.
> Compare COMBINATION.

The items of a set are, by definition of a set, distinct, so the
question of different but equal permutations does not arise.

> 2. any rearrangement of all the elements of a finite sequence, such as
> (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of
> exchanges of position yielding it from the original order is odd or
> even. It is a *cyclic permutation* if it merely advances all the
> elements a fixed number of places; that is, if it is a CYCLE of maximal
> LENGTH. A *transposition* is a cycle of degree two, and all permutations
> factor as products of transpositions. See also SIGNATURE.

The items of a sequence may be duplicates. But in the treatments of
permutations I have seen (admittedly not all of them), they are
considered to be distinguished by position, so that one may replace the
item by counts 1 to n and vice versa.

> 3. any BIJECTION of a set to itself, where the set may be finite or
> infinite.
> [end quote]

Back to a set of distinct items again.

You are correct that itertools.permutations does the right thing by
standard definition.

> Last but not least, here's a source which clearly distinguishes
> permutations from "permutations with duplicates":
>
> http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html
>
> and even gives a distinct formula for calculating the number of
> permutations. Neither Wolfram Mathworld nor the Collins Dictionary of
> Maths consider this formula important enough to mention, which suggests
> strongly that it should be considered separate from the default
> permutations.

The question is whether this particular variation is important inportant
enough to put in itertools. It is not a combinatorics module and did not
start with permutations.

--
Terry Jan Reedy

Nick Coghlan

unread,
Oct 12, 2013, 11:07:58 AM10/12/13
to Neil Girdhar, python...@python.org, python...@googlegroups.com


On 12 Oct 2013 17:18, "Neil Girdhar" <miste...@gmail.com> wrote:
>
> I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition.  Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1   Please look in the chapter "Permutation of multisets".

Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :)

Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results).

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

Cheers,
Nick.

MRAB

unread,
Oct 12, 2013, 12:55:31 PM10/12/13
to python-ideas
On 12/10/2013 03:36, David Mertz wrote:
> Hi MRAB,
>
> I'm confused by your implementation. In particular, what do these lines do?
>
> # [...]
> items = list(iterable)
> keys = {}
> for item in items:
> keys.setdefault(item, len(keys))
> items.sort(key=keys.get)
>
> I cannot understand how these can possibly have any effect (other than
> the first line that makes a concrete list out of an iterable).
>
> We loop through the list in its natural order. E.g. say the list is
> '[a, b, c]' (where those names are any types of objects whatsoever).
> The loop gives us:
>
> keys == {a:0, b:1, c:2}
>
> When we do a sort on 'key=keys.get()' how can that ever possibly change
> the order of 'items'?
>
You're assuming that no item is equal to any other.

Try this:

keys = {}
for item in [1, 2, 2.0]:
keys.setdefault(item, len(keys))

You'll get:

keys == {1: 0, 2: 1}

because 2 == 2.0.

> There's also a bit of a flaw in that your implementation blows up if
> anything yielded by iterable isn't hashable:
>
> >>> list(unique_permutations([ [1,2],[3,4],[5,6] ]))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: unhashable type: 'list'
>
That is true, so here is yet another implementation:

----8<----------------------------------------8<----

def distinct_permutations(iterable, count=None):
def perm(items, count):
if count:
prev_item = object()

for i, item in enumerate(items):
if item != prev_item:
for p in perm(items[ : i] + items[i + 1 : ], count
- 1):
yield [item] + p

prev_item = item
else:
yield []

hashable_items = {}
unhashable_items = []

for item in iterable:
try:
hashable_items[item].append(item)
except KeyError:
hashable_items[item] = [item]
except TypeError:
for key, values in unhashable_items:
if key == item:
values.append(item)
break
else:
unhashable_items.append((item, [item]))

items = []

for values in hashable_items.values():
items.extend(values)

for key, values in unhashable_items:
items.extend(values)

if count is None:
count = len(items)

yield from perm(items, count)

----8<----------------------------------------8<----

It uses a dict for speed, with the fallback of a list for unhashable
items.

>
> >>> list(permutations([[1,2],[3,4],[5,6]]))
> [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1,
> 2], [5, 6]),
> ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3,
> 4], [1, 2])]
>
> This particular one also succeeds with my nonredundant_permutations:
>
> >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]]))
> [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1,
> 2], [5, 6]),
> ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3,
> 4], [1, 2])]
>
My result is:

>>> list(distinct_permutations([[1,2],[3,4],[5,6]]))
[[[1, 2], [3, 4], [5, 6]], [[1, 2], [5, 6], [3, 4]], [[3, 4], [1, 2],
[5, 6]], [[3, 4], [5, 6], [1, 2]], [[5, 6], [1, 2], [3, 4]], [[5, 6],
[3, 4], [1, 2]]]

> However, my version *DOES* fail when things cannot be compared under
> inequality:
>
> >>> list(nonredundant_permutations([[1,2],3,4]))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 3, in nonredundant_permutations
> TypeError: unorderable types: int() < list()
>
> This also doesn't afflict itertools.permutations.
>
My result is:

>>> list(distinct_permutations([[1,2],3,4]))
[[3, 4, [1, 2]], [3, [1, 2], 4], [4, 3, [1, 2]], [4, [1, 2], 3], [[1,
2], 3, 4], [[1, 2], 4, 3]]

David Mertz

unread,
Oct 12, 2013, 12:56:13 PM10/12/13
to Nick Coghlan, Neil Girdhar, python...@googlegroups.com, python-ideas
On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncog...@gmail.com> wrote:

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation:

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['a', 'a', 'b']

Or:

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['b', 'a', 'a']

But even this would be per documentation (although really unlikely as an implementation):

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['a', 'b', 'a']

Neil Girdhar

unread,
Oct 12, 2013, 2:30:03 PM10/12/13
to Raymond Hettinger, python...@googlegroups.com
Thanks, but did you see the problem that I was solving?  Your solution doesn't work for it…

Best,

Neil



On Sat, Oct 12, 2013 at 1:08 PM, Raymond Hettinger <raymond....@gmail.com> wrote:

On Oct 11, 2013, at 11:38 AM, Neil Girdhar <miste...@gmail.com> wrote:

Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates.

No thanks.  The current design was done on purpose.  It is simple, fast, flexible and fits neatly with the other combinatoric iterators.  

The docs are clear about what permutations() does:
"Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation."

So, if you want to "output only unique permutations", then use set() to make the inputs unique:  permutations(set(population), n).

It isn't hard.


Raymond




Neil Girdhar

unread,
Oct 12, 2013, 2:56:55 PM10/12/13
to Nick Coghlan, python-ideas, python...@googlegroups.com
Yes, you're right and I understand what's been done although like the 30 upvoters to the linked stackoverflow question, I find the current behaviour surprising and would like to see a distinct_permutations function.  How do I start to submit a patch?

Neil

Raymond Hettinger

unread,
Oct 12, 2013, 8:44:38 PM10/12/13
to Neil Girdhar, python...@googlegroups.com, python-ideas

On Oct 12, 2013, at 11:56 AM, Neil Girdhar <miste...@gmail.com> wrote:

, I find the current behaviour surprising and would like to see a distinct_permutations function.
 How do I start to submit a patch?

You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer).

That said, the odds of it being accepted are slim.
There are many ways to write combinatoric functions
(Knuth has a whole book on the subject) and I don't
aspire to include multiple variants unless there are
strong motivating use cases.

In general, if someone wants to eliminate duplicates
from the population, they can do so easily with:

   permutations(set(population), n)

The current design solves the most common use cases
and it has some nice properties such as:
 * permutations is a subsequence of product
 * no assumptions are made about the comparability
   or orderability of members of the population
 * len(list(permutations(range(n), r))) == n! / (n-r)! 
   just like you were taught in school
 * it is fast

For more exotic needs, I think is appropriate to look
outside the standard library to more full-featured
combinatoric libraries (there are several listed at

  
Raymond

Neil Girdhar

unread,
Oct 12, 2013, 9:24:36 PM10/12/13
to Raymond Hettinger, python...@googlegroups.com, python-ideas
Hi Raymond,

I agree with you on the consistency point with itertools.product.  That's a great point.

However, permutations(set(population)) is not the correct way to take the permutations of a multiset.  Please take a look at how permutations are taken from a multiset from any of the papers I linked or any paper that you can find on the internet.  The number of permutations of multiset is n! /  \prod a_i! for a_i are the element counts — just like I was taught in school.

There is currently no fast way to find these permutations of a multiset and it is a common operation for solving problems.  What is needed, I think is a function multiset_permutations that accepts an iterable.

Best,

Neil

Ethan Furman

unread,
Oct 12, 2013, 9:11:18 PM10/12/13
to python...@python.org
On 10/12/2013 05:44 PM, Raymond Hettinger wrote:
>
> On Oct 12, 2013, at 11:56 AM, Neil Girdhar <miste...@gmail.com <mailto:miste...@gmail.com>> wrote:
>
>> , I find the current behaviour surprising and would like to see a distinct_permutations function.
>> How do I start to submit a patch?
>
> You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer).
>
> That said, the odds of it being accepted are slim.

+1

About the only improvement I can see would be a footnote in the itertools doc table that lists the different
combinatorics. Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples,
all possible orderings, no repeated elements" meant no repeated values. The longer text for permutations makes it clear
how it works.

My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear.
If no one beats me to a proposed patch I'll see if I can figure it out.

--
~Ethan~

Steven D'Aprano

unread,
Oct 12, 2013, 9:47:42 PM10/12/13
to python...@python.org
On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:

> In general, if someone wants to eliminate duplicates
> from the population, they can do so easily with:
>
> permutations(set(population), n)

In fairness Raymond, the proposal is not to eliminate duplicates from
the population, but from the permutations themselves. Consider the
example I gave earlier, where you're permuting "RRRBB" two items at a
time. There are 20 permutations including duplicates, but sixteen of
them are repeated:

py> list(''.join(t) for t in permutations("RRRBB", 2))
['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB',
'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
py> set(''.join(t) for t in permutations("RRRBB", 2))
{'BR', 'RB', 'RR', 'BB'}

But if you eliminate duplicates from the population first, you get
only two permutations:

py> list(''.join(t) for t in permutations(set("RRRBB"), 2))
['BR', 'RB']


If it were just a matter of calling set() on the output of permutations,
that would be trivial enough. But, you might care about order, or
elements might not be hashable, or you might have a LOT of permutations
to generate before discarding:

population = "R"*1000 + "B"*500
set(''.join(t) for t in permutations(population, 2)) # takes a while...

In my opinion, if unique_permutations is no more efficient than calling
set on the output of permutations, it's not worth it. But if somebody
can come up with an implementation which is significantly more
efficient, without making unreasonable assumptions about orderability,
hashability or even comparability, then in my opinion that might be
worthwhile.

Raymond Hettinger

unread,
Oct 12, 2013, 11:03:43 PM10/12/13
to Steven D'Aprano, python...@python.org

On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <st...@pearwood.info> wrote:

the proposal is not to eliminate duplicates from 
the population, but from the permutations themselves.

I'm curious about the use cases for this.
Other than red/blue marble examples and some puzzle problems,
does this come-up in any real problems?  Do we actually need this?


Raymond

Neil Girdhar

unread,
Oct 13, 2013, 3:38:40 AM10/13/13
to python...@googlegroups.com, python-ideas
My intuition is that we want Python to be "complete".  Many other languages can find the permutations of a multiset.  Python has a permutations function.  Many people on stackoverflow expected that function to be able to find those permutations.

One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping?  If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset.  Then, it could do the right thing for that case.

Best,
Neil




_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas

Nick Coghlan

unread,
Oct 13, 2013, 5:27:54 AM10/13/13
to Neil Girdhar, python-ideas
On 13 October 2013 17:38, Neil Girdhar <miste...@gmail.com> wrote:
> My intuition is that we want Python to be "complete". Many other languages
> can find the permutations of a multiset. Python has a permutations
> function. Many people on stackoverflow expected that function to be able to
> find those permutations.

Nope, we expressly *don't* want the standard library to be "complete",
because that would mean growing to the size of PyPI (or larger).
There's always going to be scope for applications to adopt new domain
specific dependencies with more in-depth support than that provided by
the standard library.

Many standard library modules are in fact deliberately designed as
"stepping stone" modules that will meet the needs of code which have
an incidental relationship to that task, but will need to be replaced
with something more sophisticated for code directly related to that
domain. Many times, that means they will ignore as irrelevant
distinctions that are critical in certain contexts, simply because
they don't come up all that often outside those specific domains, and
addressing them involves making the core module more complicated to
use for more typical cases.

In this case, the proposed alternate permutations mechanism only makes
a difference when:

1. The data set contains equivalent values
2. Input order is not considered significant, so exchanging equivalent
values should *not* create a new permutation (i.e. multiset
permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a
difference, then providing both in the standard library isn't being
helpful, it's being actively user hostile by asking them to make a
decision they're not yet qualified to make for the sake of the few
experts that specifically need . Hence Raymond's request for data
modelling problems outside the "learning or studying combinatorics"
context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent
of the currrent behaviour of itertools.permutations: Haskell has it as
Data.List.permutations. As far as I can tell, Haskell doesn't offer
support for multiset permutations in the core, you need an additional
package like Math.Combinatorics (see:
http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4).

Since iterator based programming in Python is heavily inspired by
Haskell, this suggests that the current behaviour of
itertools.permutations is appropriate and that Raymond is right to be
dubious about including multiset permutations support directly in the
standard library.

Those interested in improving the experience of writing combinatorics
code in Python may wish to look into helping out with the
combinatorics package on PyPI:
http://phillipmfeldman.org/Python/for_developers.html (For example,
politely approach Phillip to see if he is interested in hosting it on
GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving
the PyPI metadata, etc - note I have no experience with this package,
it's just the first hit for "python combinatorics")

> One suggestion: Why not make it so that itertools.permutations checks if its
> argument is an instance of collections.Mapping? If it is, we could
> interpret the items as a mapping from elements to positive integers, which
> is a compact representation of a multiset. Then, it could do the right
> thing for that case.

If you want to go down the path of only caring about hashable values,
you may want to argue for a permutations method on collections.Counter
(it's conceivable that approach has the potential to be even faster
than an approach based on accepting and processing an arbitrary
iterable, since it can avoid generating repeated values in the first
place).

A Counter based multiset permutation algorithm was actually posted to
python-list back in 2009, just after collections.Counter was
introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html

I just created an updated version of that recipe and posted it as
https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Mark Lawrence

unread,
Oct 13, 2013, 7:05:30 AM10/13/13
to python...@python.org
On 13/10/2013 08:38, Neil Girdhar wrote:
> My intuition is that we want Python to be "complete".

No thank you. I much prefer "Python in a Nutshell" the size it is now,
I'm not interested in competing with (say) "Java in a Nutshell".

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

Oscar Benjamin

unread,
Oct 13, 2013, 11:54:16 AM10/13/13
to Neil Girdhar, python-ideas
On 11 October 2013 22:38, Neil Girdhar <miste...@gmail.com> wrote:
> My code, which was the motivation for this suggestion:
>
> import itertools as it
> import math
>
> def is_prime(n):
> for i in range(2, int(math.floor(math.sqrt(n))) + 1):
> if n % i == 0:
> return False
> return n >= 2

I don't really understand what your code is doing but I just wanted to
point out that the above will fail for large integers (maybe not
relevant in your case):

>>> is_prime(2**19937-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tmp.py", line 3, in is_prime
for i in range(2, int(math.floor(math.sqrt(n))) + 1):
OverflowError: long int too large to convert to float

Even without the OverflowError I suspect that there are primes p >
~1e16 such that is_prime(p**2) would incorrectly return True. This is
a consequence of depending on FP arithmetic in what should be exact
computation. The easy fix is to break when i**2 > n avoiding the
tricky sqrt operation. Alternatively you can use an exact integer sqrt
function to fix this:

def sqrt_floor(y):
try:
x = int(math.sqrt(y))
except OverflowError:
x = y
while not (x ** 2 <= y < (x+1) ** 2):
x = (x + y // x) // 2
return x


Oscar

Neil Girdhar

unread,
Oct 13, 2013, 2:29:38 PM10/13/13
to Oscar Benjamin, python-ideas
Did you read the problem?   Anyway, let's not get off topic (permutations).

Neil

Tim Peters

unread,
Oct 13, 2013, 3:02:56 PM10/13/13
to python-ideas
[MRAB, posts a beautiful solution]

I don't really have a use for this, but it was a lovely programming
puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
below. And that's the end of my interest in this ;-)

It doesn't require that elements be orderable or even hashable. It
does require that they can be compared for equality, but it's pretty
clear that if we _do_ include something like this, "equality" has to
be pluggable. By default, this uses `operator.__eq__`, but any
2-argument function can be used. E.g., use `operator.is_` to make it
believe that only identical objects are equal. Or pass a lambda to
distinguish by type too (e.g., if you don't want 3 and 3.0 to be
considered equal). Etc.

The code is much lower-level, to make it closer to an efficient C
implementation. No dicts, no sets, no slicing or concatenation of
lists, etc. It sticks to using little integers (indices) as far as
possible, which can be native in C (avoiding mounds of increfs and
decrefs).

Also, because "equality" is pluggable, it may be a slow operation.
The `equal()` function is only called here during initial setup, to
partition the elements into equivalence classes. Where N is
len(iterables), at best `equal()` is called N-1 times (if all elements
happen to be equal), and at worst N*(N-1)/2 times (if no elements
happen to be equal), all independent of `count`. It assumes `equal()`
is transitive.

It doesn't always return permutations in the same order as MRAB's
function, because - to avoid any searching - it iterates over
equivalence classes instead of over the original iterables. This is
the simplest differing example I can think of:

>>> list(unique_permutations("aba", 2))
[('a', 'b'), ('a', 'a'), ('b', 'a')]

For the first result, MRAB's function first picks the first 'a', then
removes it from the iterables and recurses on ("ba", 1). So it finds
'b' next, and yields ('a', 'b') (note: this is the modified
unique_permutations() below - MRAB's original actually yielded lists,
not tuples).

But:

>>> list(up("aba", 2))
[('a', 'a'), ('a', 'b'), ('b', 'a')]

Different order! That's because "up" is iterating over (conceptually)

[EquivClass(first 'a', second 'a'), EquivClass('b')]

It first picks the first `a`, then adjusts list pointers (always a
fast, constant-time operation) so that it recurses on

[EquivClass(second 'a'), EquivClass('b')]

So it next finds the second 'a', and yields (first 'a', second 'a') as
its first result. Maybe this will make it clearer:

>>> list(up(["a1", "b", "a2"], 2, lambda x, y: x[0]==y[0]))
[('a1', 'a2'), ('a1', 'b'), ('b', 'a1')]

No, I guess that didn't make it clearer - LOL ;-) Do I care? No.

Anyway, here's the code. Have fun :-)

# MRAB's beautiful solution, modified in two ways to be
# more like itertools.permutations:
# 1. Yield tuples instead of lists.
# 2. When count > len(iterable), don't yield anything.

def unique_permutations(iterable, count=None):
def perm(items, count):
if count:
seen = set()
for i, item in enumerate(items):
if item not in seen:
for p in perm(items[:i] + items[i+1:], count - 1):
yield [item] + p
seen.add(item)
else:
yield []

items = list(iterable)
if count is None:
count = len(items)
if count > len(items):
return
for p in perm(items, count):
yield tuple(p)

# New code, ending in generator `up()`.
import operator

# In C, this would be a struct of native C types,
# and the brief methods would be coded inline.
class ENode:
def __init__(self, initial_index=None):
self.indices = [initial_index] # list of equivalent indices
self.current = 0
self.prev = self.next = self

def index(self):
"Return current index."
return self.indices[self.current]

def unlink(self):
"Remove self from list."
self.prev.next = self.next
self.next.prev = self.prev

def insert_after(self, x):
"Insert node x after self."
x.prev = self
x.next = self.next
self.next.prev = x
self.next = x

def advance(self):
"""Advance the current index.

If we're already at the end, remove self from list.

.restore() undoes everything .advance() did."""

assert self.current < len(self.indices)
self.current += 1
if self.current == len(self.indices):
self.unlink()

def restore(self):
"Undo what .advance() did."
assert self.current <= len(self.indices)
if self.current == len(self.indices):
self.prev.insert_after(self)
self.current -= 1

def build_equivalence_classes(items, equal):
ehead = ENode() # headed, doubly-linked circular list of equiv classes
for i, elt in enumerate(items):
e = ehead.next
while e is not ehead:
if equal(elt, items[e.indices[0]]):
# Add (index of) elt to this equivalence class.
e.indices.append(i)
break
e = e.next
else:
# elt not equal to anything seen so far: append
# new equivalence class.
e = ENode(i)
ehead.prev.insert_after(e)
return ehead

def up(iterable, count=None, equal=operator.__eq__):
def perm(i):
if i:
e = ehead.next
assert e is not ehead
while e is not ehead:
result[count - i] = e.index()
e.advance()
yield from perm(i-1)
e.restore()
e = e.next
else:
yield tuple(items[j] for j in result)

items = tuple(iterable)
if count is None:
count = len(items)
if count > len(items):
return

ehead = build_equivalence_classes(items, equal)
result = [None] * count
yield from perm(count)

MRAB

unread,
Oct 13, 2013, 3:30:42 PM10/13/13
to python-ideas
On 13/10/2013 20:02, Tim Peters wrote:
> [MRAB, posts a beautiful solution]
>
> I don't really have a use for this, but it was a lovely programming
> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
> below. And that's the end of my interest in this ;-)
>
> It doesn't require that elements be orderable or even hashable. It
> does require that they can be compared for equality, but it's pretty
> clear that if we _do_ include something like this, "equality" has to
> be pluggable. By default, this uses `operator.__eq__`, but any
> 2-argument function can be used. E.g., use `operator.is_` to make it
> believe that only identical objects are equal. Or pass a lambda to
> distinguish by type too (e.g., if you don't want 3 and 3.0 to be
> considered equal). Etc.
>
[snip]
I posted yet another implementation after that one.

Oscar Benjamin

unread,
Oct 13, 2013, 3:34:09 PM10/13/13
to Neil Girdhar, python-ideas
On 13 October 2013 19:29, Neil Girdhar <miste...@gmail.com> wrote:
> Did you read the problem?

I did but since you showed some code that you said you were working on
I thought you'd be interested to know that it could be improved.

> Anyway, let's not get off topic (permutations).

Getting back to your proposal, I disagree that permutations should be
"fixed". The current behaviour is correct. If I was asked to define a
permutation I would have given definition #3 from Steven's list: a
bijection from a set to itself. Formally a permutation of a collection
of non-unique elements is not defined.

They may also be uses for a function like the one that you proposed
but I've never needed it (and I have used permutations a few times)
and no one in this thread (including you) has given a use-case for
this.

Neil Girdhar

unread,
Oct 13, 2013, 3:39:19 PM10/13/13
to Oscar Benjamin, python-ideas
On Sun, Oct 13, 2013 at 3:34 PM, Oscar Benjamin <oscar.j....@gmail.com> wrote:
On 13 October 2013 19:29, Neil Girdhar <miste...@gmail.com> wrote:
> Did you read the problem?

I did but since you showed some code that you said you were working on
I thought you'd be interested to know that it could be improved.

The code solves the problem according to its specification :)  (The numbers are less than 1e8.)


> Anyway, let's not get off topic (permutations).

Getting back to your proposal, I disagree that permutations should be
"fixed". The current behaviour is correct. If I was asked to define a
permutation I would have given definition #3 from Steven's list: a
bijection from a set to itself. Formally a permutation of a collection
of non-unique elements is not defined.

They may also be uses for a function like the one that you proposed
but I've never needed it (and I have used permutations a few times)
and no one in this thread (including you) has given a use-case for
this.


Oscar

The problem is a use-case.  Did you read it?  Did you try solving it?

MRAB

unread,
Oct 13, 2013, 4:04:21 PM10/13/13
to python-ideas
On 13/10/2013 20:34, Oscar Benjamin wrote:
> On 13 October 2013 19:29, Neil Girdhar <miste...@gmail.com> wrote:
>> Did you read the problem?
>
> I did but since you showed some code that you said you were working on
> I thought you'd be interested to know that it could be improved.
>
>> Anyway, let's not get off topic (permutations).
>
> Getting back to your proposal, I disagree that permutations should be
> "fixed". The current behaviour is correct. If I was asked to define a
> permutation I would have given definition #3 from Steven's list: a
> bijection from a set to itself. Formally a permutation of a collection
> of non-unique elements is not defined.
>
> They may also be uses for a function like the one that you proposed
> but I've never needed it (and I have used permutations a few times)
> and no one in this thread (including you) has given a use-case for
> this.
>
Here's a use case: anagrams.

Neil Girdhar

unread,
Oct 13, 2013, 4:56:55 PM10/13/13
to Nick Coghlan, python-ideas
Executive summary:  Thanks for discussion everyone.  I'm now convinced that itertools.permutations is fine as it is.  I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library.

On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncog...@gmail.com> wrote:
On 13 October 2013 17:38, Neil Girdhar <miste...@gmail.com> wrote:
> My intuition is that we want Python to be "complete".  Many other languages
> can find the permutations of a multiset.  Python has a permutations
> function.  Many people on stackoverflow expected that function to be able to
> find those permutations.

Nope, we expressly *don't* want the standard library to be "complete",
because that would mean growing to the size of PyPI (or larger).
There's always going to be scope for applications to adopt new domain
specific dependencies with more in-depth support than that provided by
the standard library.

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.  When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.  

Many standard library modules are in fact deliberately designed as
"stepping stone" modules that will meet the needs of code which have
an incidental relationship to that task, but will need to be replaced
with something more sophisticated for code directly related to that
domain. Many times, that means they will ignore as irrelevant
distinctions that are critical in certain contexts, simply because
they don't come up all that often outside those specific domains, and
addressing them involves making the core module more complicated to
use for more typical cases.

Good point. 

In this case, the proposed alternate permutations mechanism only makes
a difference when:

1. The data set contains equivalent values
2. Input order is not considered significant, so exchanging equivalent
values should *not* create a new permutation (i.e. multiset
permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a
difference, then providing both in the standard library isn't being
helpful, it's being actively user hostile by asking them to make a
decision they're not yet qualified to make for the sake of the few
experts that specifically need . Hence Raymond's request for data
modelling problems outside the "learning or studying combinatorics"
context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent
of the currrent behaviour of itertools.permutations: Haskell has it as
Data.List.permutations. As far as I can tell, Haskell doesn't offer
support for multiset permutations in the core, you need an additional
package like Math.Combinatorics (see:
http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4).

Since iterator based programming in Python is heavily inspired by
Haskell, this suggests that the current behaviour of
itertools.permutations is appropriate and that Raymond is right to be
dubious about including multiset permutations support directly in the
standard library.


You've convinced me that itertools permutations is doing the right thing :)  I'm not sure if multiset permutations should be in the standard library or not.  It is very useful.
 
Those interested in improving the experience of writing combinatorics
code in Python may wish to look into helping out with the
combinatorics package on PyPI:
http://phillipmfeldman.org/Python/for_developers.html (For example,
politely approach Phillip to see if he is interested in hosting it on
GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving
the PyPI metadata, etc - note I have no experience with this package,
it's just the first hit for "python combinatorics")

> One suggestion: Why not make it so that itertools.permutations checks if its
> argument is an instance of collections.Mapping?  If it is, we could
> interpret the items as a mapping from elements to positive integers, which
> is a compact representation of a multiset.  Then, it could do the right
> thing for that case.

If you want to go down the path of only caring about hashable values,
you may want to argue for a permutations method on collections.Counter
(it's conceivable that approach has the potential to be even faster
than an approach based on accepting and processing an arbitrary
iterable, since it can avoid generating repeated values in the first
place).

A Counter based multiset permutation algorithm was actually posted to
python-list back in 2009, just after collections.Counter was
introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html


Nice find!  
 
I just created an updated version of that recipe and posted it as
https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py


  Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)?  Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such.  Users who want an interface similar to itertools.permutations can pass their iterable through Counter first.

Tim Peters

unread,
Oct 13, 2013, 5:22:06 PM10/13/13
to python-ideas
[Tim]
>> [MRAB, posts a beautiful solution]
>>
>> I don't really have a use for this, but it was a lovely programming
>> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
>> below. And that's the end of my interest in this ;-)
>>
>> It doesn't require that elements be orderable or even hashable. It
>> does require that they can be compared for equality, but it's pretty
>> clear that if we _do_ include something like this, "equality" has to
>> be pluggable.
>> ...

[MRAB]
> I posted yet another implementation after that one.

I know. I was talking about the beautiful one ;-) The later one
could build equivalence classes faster (than mine) in many cases, but
I don't care much about the startup costs. I care a lot more about:

1. Avoiding searches in the recursive function; i.e., this:

for i, item in enumerate(items):
if item != prev_item:

Making such tests millions (billions ...) of times adds up - and
equality testing may not be cheap. The algorithm I posted does no
item testing after the setup is done (none in its recursive function).

2. Making "equality" pluggable. Your later algorithm bought "find
equivalence classes" speed for hashable elements by using a dict, but
a dict's notion of equality can't be changed. So, make equality
pluggable, and that startup-time speed advantage vanishes for all but
operator.__eq__'s idea of equality.

Oscar Benjamin

unread,
Oct 13, 2013, 7:10:51 PM10/13/13
to Tim Peters, python-ideas
On 13 October 2013 22:22, Tim Peters <tim.p...@gmail.com> wrote:
> 2. Making "equality" pluggable. Your later algorithm bought "find
> equivalence classes" speed for hashable elements by using a dict, but
> a dict's notion of equality can't be changed. So, make equality
> pluggable, and that startup-time speed advantage vanishes for all but
> operator.__eq__'s idea of equality.

It sounds like you want Antoine's TransformDict:
http://www.python.org/dev/peps/pep-0455/


Oscar

Oscar Benjamin

unread,
Oct 13, 2013, 7:32:34 PM10/13/13
to Tim Peters, python-ideas
On 14 October 2013 00:23, Tim Peters <tim.p...@gmail.com> wrote:
> [Tim]
>>> 2. Making "equality" pluggable. Your later algorithm bought "find
>>> equivalence classes" speed for hashable elements by using a dict, but
>>> a dict's notion of equality can't be changed. So, make equality
>>> pluggable, and that startup-time speed advantage vanishes for all but
>>> operator.__eq__'s idea of equality.
>
> [Oscar Benjamin]
>> It sounds like you want Antoine's TransformDict:
>> http://www.python.org/dev/peps/pep-0455/
>
> Not really in this case - I want a two-argument function ("are A and B
> equal?"). Not all plausible cases of that can be mapped to a
> canonical hashable key. For example, consider permutations of a list
> of lists, where the user doesn't want int and float elements of the
> lists to be considered equal when they happen to have the same value.
> Is that a stretch? Oh ya ;-)

Will this do?
d = TransformDict(lambda x: (type(x), x))

Tim Peters

unread,
Oct 13, 2013, 7:44:14 PM10/13/13
to Oscar Benjamin, python-ideas
[Oscar Benjamin]
> Will this do?
> d = TransformDict(lambda x: (type(x), x))

No. In the example I gave, *lists* will be passed as x (it was a list
of lists: the lists are the elements of the permutations, and they
happen to have internal structure of their own). So the `type(x)`
there is useless (it will always be the list type), while the lists
themselves would still be compared by operator.__eq__.

Not to mention that the constructed tuple isn't hashable anyway (x is
a list), so can't be used by TransformDict.

So that idea doesn't work several times over ;-)

Oscar Benjamin

unread,
Oct 13, 2013, 7:55:29 PM10/13/13
to Tim Peters, python-ideas
On 14 October 2013 00:44, Tim Peters <tim.p...@gmail.com> wrote:
> [Oscar Benjamin]
>> Will this do?
>> d = TransformDict(lambda x: (type(x), x))
>
> No. In the example I gave, *lists* will be passed as x (it was a list
> of lists: the lists are the elements of the permutations, and they
> happen to have internal structure of their own). So the `type(x)`
> there is useless (it will always be the list type), while the lists
> themselves would still be compared by operator.__eq__.
>
> Not to mention that the constructed tuple isn't hashable anyway (x is
> a list), so can't be used by TransformDict.
>
> So that idea doesn't work several times over ;-)

Damn, you're right. I obviously didn't think that one through hard
enough. Okay how about this?
d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))


Oscar

Tim Peters

unread,
Oct 13, 2013, 7:23:51 PM10/13/13
to Oscar Benjamin, python-ideas
[Tim]
>> 2. Making "equality" pluggable. Your later algorithm bought "find
>> equivalence classes" speed for hashable elements by using a dict, but
>> a dict's notion of equality can't be changed. So, make equality
>> pluggable, and that startup-time speed advantage vanishes for all but
>> operator.__eq__'s idea of equality.

[Oscar Benjamin]
> It sounds like you want Antoine's TransformDict:
> http://www.python.org/dev/peps/pep-0455/

Not really in this case - I want a two-argument function ("are A and B
equal?"). Not all plausible cases of that can be mapped to a
canonical hashable key. For example, consider permutations of a list
of lists, where the user doesn't want int and float elements of the
lists to be considered equal when they happen to have the same value.
Is that a stretch? Oh ya ;-)

Oscar Benjamin

unread,
Oct 13, 2013, 8:20:19 PM10/13/13
to Tim Peters, python-ideas
On 14 October 2013 01:15, Tim Peters <tim.p...@gmail.com> wrote:
> [Oscar Benjamin]
>> ...
>> Damn, you're right. I obviously didn't think that one through hard
>> enough. Okay how about this?
>> d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))
>
> Oscar, please give this up - it's not going to work. `x` can be any
> object whatsoever, with arbitrarily complex internal structure, and
> the user can have an arbitrarily convoluted idea of what "equal"
> means. Did I mention that these lists don't *only* have ints and
> floats as elements, but also nested sublists? Oh ya - they also want
> a float and a singleton list containing the same float to be
> considered equal ;-) Etc.

That does seem contrived but then I guess the whole problem is however....

> Besides, you're trying to solve a problem I didn't have to begin with
> ;-) That is, I don't care much about the cost of building equivalence
> classes - it's a startup cost for the generator, not an "inner loop"
> cost. Even if you could bash every case into a different convoluted
> hashable tuple, in general it's going to be - in this specific problem
> - far easier for the user to define an equal() function they like,
> working directly on the two objects. That doesn't require an endless
> sequence of tricks.

okay I see what you mean.

Tim Peters

unread,
Oct 13, 2013, 8:15:00 PM10/13/13
to Oscar Benjamin, python-ideas
[Oscar Benjamin]
> ...
> Damn, you're right. I obviously didn't think that one through hard
> enough. Okay how about this?
> d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))

Oscar, please give this up - it's not going to work. `x` can be any
object whatsoever, with arbitrarily complex internal structure, and
the user can have an arbitrarily convoluted idea of what "equal"
means. Did I mention that these lists don't *only* have ints and
floats as elements, but also nested sublists? Oh ya - they also want
a float and a singleton list containing the same float to be
considered equal ;-) Etc.

Besides, you're trying to solve a problem I didn't have to begin with
;-) That is, I don't care much about the cost of building equivalence
classes - it's a startup cost for the generator, not an "inner loop"
cost. Even if you could bash every case into a different convoluted
hashable tuple, in general it's going to be - in this specific problem
- far easier for the user to define an equal() function they like,
working directly on the two objects. That doesn't require an endless
sequence of tricks.

Mark Dickinson

unread,
Oct 14, 2013, 8:29:04 AM10/14/13
to Neil Girdhar, python-ideas
On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <miste...@gmail.com> wrote:

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.

An interesting choice of example.  *Why* would you want to do so?

Since you bring this up, I assume you're already aware that math.erf exists but cmath.erf does not.  I believe there are good, practical reasons *not* to add cmath.erf, in spite of the existence of math.erf.  Not least of these is that cmath.erf would be significantly more complicated to implement and of significantly less interest to users.  And perhaps there's a parallel with itertools.permutations and the proposed itertools.multiset_permutations here...

-- 
Mark

Neil Girdhar

unread,
Oct 14, 2013, 8:37:59 AM10/14/13
to Mark Dickinson, python-ideas
Actually I didn't notice that.  It seems weird to find erf in math, but erf for complex numbers in scipy.special.  It's just about organization and user discovery.  I realize that from the developer's point of view, erf for complex numbers is complicated, but why does the user care?


Oscar Benjamin

unread,
Oct 14, 2013, 9:11:42 AM10/14/13
to Neil Girdhar, python-ideas
On 14 October 2013 13:37, Neil Girdhar <miste...@gmail.com> wrote:
>
> Actually I didn't notice that. It seems weird to find erf in math, but erf
> for complex numbers in scipy.special. It's just about organization and user
> discovery. I realize that from the developer's point of view, erf for
> complex numbers is complicated, but why does the user care?

This is the first time I've seen a suggestion that there should be
cmath.erf. So I would say that most users don't care about having a
complex error function. Whoever would take the time to implement the
complex error function might instead spend that time implementing and
maintaining something that users do care about.


Oscar

Neil Girdhar

unread,
Oct 14, 2013, 9:15:06 AM10/14/13
to Oscar Benjamin, python-ideas
Look I don't want it, and anyway it's already in scipy.special.  I just organizational symmetry.  I expected to find complex versions of math functions in cmath —not in scipy special.

Mark Lawrence

unread,
Oct 14, 2013, 9:26:59 AM10/14/13
to python...@python.org
On 14/10/2013 14:15, Neil Girdhar wrote:
> Look I don't want it, and anyway it's already in scipy.special. I just
> organizational symmetry. I expected to find complex versions of math
> functions in cmath —not in scipy special.
>

Why are you comparing core Python modules with third party ones?

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

Raymond Hettinger

unread,
Oct 14, 2013, 1:56:23 PM10/14/13
to Neil Girdhar, python-ideas

On Oct 13, 2013, at 1:56 PM, Neil Girdhar <miste...@gmail.com> wrote:

 I'm now convinced that itertools.permutations is fine as it is.  I am not totally convinced that multiset_permutations doesn't belong in itertools,

Now that we have a good algorithm,  I'm open to adding this to itertools,
but it would need to have a name that didn't create any confusion
with respect to the existing tools, perhaps something like:

    anagrams(population, r)

    Return an iterator over a all distinct r-length permutations
    of the population.

    Unlike permutations(), element uniqueness is determined
    by value rather than by position.  Also, anagrams() makes
    no guarantees about the order the tuples are generated.



Raymond

Neil Girdhar

unread,
Oct 14, 2013, 4:28:44 PM10/14/13
to Raymond Hettinger, python-ideas
Excellent!

My top two names are
1. multiset_permutations (reflects the mathematical name)
2. anagrams

Note that we may also want to add multiset_combinations.  It hasn't been part of this discussion, but it may be part of another discussion and I wanted to point this out as I know many of you are future-conscious.

We seem to be all agreed that we want to accept "r", the length of the permutation desired.

With permutations, the *set* is passed in as a iterable representing distinct elements.  With multiset_permutations, there are three ways to pass in the *multiset*:
- 1. an iterable whose elements (or an optional key function applied to which) are compared using __eq__
- 2. a dict (of which collections.Counter) is a subclass
- 3. an iterable whose elements are key-value pairs and whose values are counts

Example uses:
1. multiset_permutations(word)
2. multiset_permutations(Counter(word))
3. multiset_permutations(Counter(word).items())

From a dictionary:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in d.items()))
2. multiset_permutations(d)
3. multiset_permutations(d.items())

From an iterable of key-value pairs:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in it))
2. multiset_permutations({k: v for k, v in it})
3. multiset_permutations(it)

The advantage of 2 is that no elements are compared by multiset_permutations (so it is simpler and faster).
The advantage of 3 is that no elements are compared, and they need not be comparable or hashable.  This version is truly a generalization of the "permutations" function.  This way, for any input "it" you could pass to permutations, you could equivalently pass zip(it, itertools.repeat(1)) to multiset_permutations.

Comments?

Neil

Bruce Leban

unread,
Oct 14, 2013, 4:52:40 PM10/14/13
to python-ideas

On Sun, Oct 13, 2013 at 1:04 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
Here's a use case: anagrams.

For what it's worth, I've written anagram-finding code, and I didn't do it with permutations. The faster approach is to create a dictionary mapping a canonical form of each word to a list of words, e.g.,

{
  'ACT': ['ACT', 'CAT'],
  'AET': ['ATE', 'EAT', 'ETA', 'TEA']
}

This requires extra work to build the map but you do that just once when you read the dictionary and then every lookup is O(1) not O(len(word)). This canonical form approach is useful for other word transformations that are used in puzzles, e.g., words that are have the same consonants (ignoring vowels).


--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security

P.S. Yes, I know: if you play Scrabble, TAE is also a word.

Terry Reedy

unread,
Oct 14, 2013, 6:59:54 PM10/14/13
to python...@python.org
On 10/14/2013 4:28 PM, Neil Girdhar wrote:
> Excellent!
>
> My top two names are
> 1. multiset_permutations (reflects the mathematical name)
> 2. anagrams

I like anagrams. I did not completely get what this issue was about
until someone finally mentioned anagrams as use case.

--
Terry Jan Reedy

Tim Peters

unread,
Oct 14, 2013, 8:48:17 PM10/14/13
to Raymond Hettinger, Neil Girdhar, python-ideas
[Raymond Hettinger]
> Now that we have a good algorithm, I'm open to adding this to itertools,

I remain reluctant, because I still haven't seen a compelling use
case. Yes, it generates all distinct r-letter anagrams - but so what?
LOL ;-) Seriously, I've written anagram programs several times in my
life, and generating "all possible" never occurred to me because it's
so crushingly inefficient.


> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
> anagrams(population, r)

"anagrams" is great! Inspired :-)

What about an optional argument to define what the _user_ means by
"equality"? The algorithm I posted had an optional
`equal=operator.__eq__` argument. Else you're going to be pushed to
add a clumsy `TransformAnagrams` later <0.4 wink>.

> Return an iterator over a all distinct r-length permutations
> of the population.
>
> Unlike permutations(), element uniqueness is determined
> by value rather than by position. Also, anagrams() makes
> no guarantees about the order the tuples are generated.

Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the
elements support a total order, and appear in the iterable in
non-decreasing order, then the anagrams are generated in
non-decreasing lexicographic order. And that may be a useful
guarantee (hard to tell without a real use case, though!).

There's another ambiguity I haven't seen addressed explicitly. Consider this:

>>> from fractions import Fraction
>>> for a in anagrams([3, 3.0, Fraction(3)], 3):
... print(a)

(3, 3.0, Fraction(3, 1))

All the algorithms posted here work to show all 3 elements in this
case. But why? If the elements all equal, then other outputs "should
be" acceptable too. Like

(3, 3, 3)

or

(3.0, Fraction(3, 1), 3.0)

etc. All those outputs compare equal!

This isn't visible if, e.g., the iterable's elements are letters
(where a == b if and only if str(a) == str(b), so the output looks the
same no matter what).

At least "my" algorithm could be simplified materially if it only
saved (and iterated over) a (single) canonical representative for each
equivalence class, instead of saving entire equivalence classes and
then jumping through hoops to cycle through each equivalence class's
elements.

But, for some reason, output (3, 3, 3) just "looks wrong" above. I'm
not sure why.

Neil Girdhar

unread,
Oct 14, 2013, 9:17:26 PM10/14/13
to python...@googlegroups.com, python-ideas
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas

--

---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


MRAB

unread,
Oct 14, 2013, 9:17:56 PM10/14/13
to python-ideas
[snip]
I can see that one disadvantage of my algorithm is that the worst-case
storage requirement is O(n^2) (I think). This is because the set of
first items could have N members, the set of second items could have
N-1 members, etc. On the other hand, IMHO, the sheer number of
permutations will become a problem long before the memory requirement
does! :-)

Steven D'Aprano

unread,
Oct 14, 2013, 9:27:18 PM10/14/13
to python...@python.org
On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:
> Actually I didn't notice that. It seems weird to find erf in math, but erf
> for complex numbers in scipy.special. It's just about organization and
> user discovery. I realize that from the developer's point of view, erf for
> complex numbers is complicated, but why does the user care?

99% of users don't care about math.errf at all. Of those who do, 99%
don't care about cmath.errf. I'd like to see cmath.errf because I'm a
maths junkie, but if I were responsible for *actually doing the work*
I'd make the same decision to leave cmath.errf out and leave it for a
larger, more complete library like scipy.

There are an infinitely large number of potential programs which could
in principle be added to Python's std lib, and only a finite number of
person-hours to do the work. And there are costs to adding software to
the std lib, not just benefits.


--
Steven

Neil Girdhar

unread,
Oct 14, 2013, 9:29:24 PM10/14/13
to python...@googlegroups.com, python-ideas
You make a good point.  It was just a random example to illustrate that desire for completeness.


Nick Coghlan

unread,
Oct 14, 2013, 9:39:30 PM10/14/13
to Neil Girdhar, python-ideas, python...@googlegroups.com
On 15 October 2013 11:29, Neil Girdhar <miste...@gmail.com> wrote:
> You make a good point. It was just a random example to illustrate that
> desire for completeness.

The desire for conceptual purity and consistency is a good one, it
just needs to be balanced against the practical constraints of
writing, maintaining, documenting, teaching and learning the standard
library.

"It isn't worth the hassle" is the answer to a whole lot of "Why not
X?" questions in software development :)

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Tim Peters

unread,
Oct 14, 2013, 9:40:00 PM10/14/13
to python-ideas
[MRAB]
> I can see that one disadvantage of my algorithm is that the worst-case
> storage requirement is O(n^2) (I think). This is because the set of
> first items could have N members, the set of second items could have
> N-1 members, etc. On the other hand, IMHO, the sheer number of
> permutations will become a problem long before the memory requirement
> does! :-)

My rewrite is O(N) space (best and worst cases). I _think_ yours is
too, but I understand my rewrite better by now ;-)

Each element of the iterable appears in exactly one ENode: the
`ehead` list is a partitioning of the input iterable.

Neil Girdhar

unread,
Oct 14, 2013, 9:40:52 PM10/14/13
to Nick Coghlan, python-ideas, python...@googlegroups.com

On Mon, Oct 14, 2013 at 9:39 PM, Nick Coghlan <ncog...@gmail.com> wrote:


Totally agree.

Tim Peters

unread,
Oct 14, 2013, 10:45:33 PM10/14/13
to Python-Ideas
One example of prior art: Maxima, which I use in its wxMaxima incarnation.

"""
Function: permutations(a)

Returns a set of all distinct permutations of the members of the list
or set a. Each permutation is a list, not a set.

When a is a list, duplicate members of a are included in the permutations
"""

Examples from a Maxima shell:

> permutations([1, 2. 3]);
{[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}

> permutations([[1, 2], [1, 2], [2, 3]])
{[[1,2],[1,2],[2,3]],
[[1,2],[2,3],[1,2]],
[[2,3],[1,2],[1,2]]}

> permutations({1, 1.0, 1, 1.0})
{[1,1.0],[1.0,1]}

That last one may be surprising at first, but note that it's the first
example where I passed a _set_ (instead of a list). And:

> {1, 1.0, 1, 1.0}
{1,1.0}

Best I can tell, Maxima has no builtin function akin to our
permutations(it, r) when r < len(it). But Maxima has a huge number of
builtin functions, and I often struggle to find ones I _want_ in its
docs ;-)

Mark Lawrence

unread,
Oct 15, 2013, 3:30:21 AM10/15/13
to python...@python.org
On 15/10/2013 02:39, Nick Coghlan wrote:
>
> The desire for conceptual purity and consistency is a good one, it
> just needs to be balanced against the practical constraints of
> writing, maintaining, documenting, teaching and learning the standard
> library.
>
> "It isn't worth the hassle" is the answer to a whole lot of "Why not
> X?" questions in software development :)
>
> Cheers,
> Nick.
>

Would our volunteers be more inclined to take on the hassle if they got
double time on Saturdays and triple time on Sundays? :)

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

Bruce Leban

unread,
Oct 15, 2013, 4:34:46 PM10/15/13
to Nick Coghlan, Neil Girdhar, python...@googlegroups.com, python-ideas

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncog...@gmail.com> wrote:
"It isn't worth the hassle" is the answer to a whole lot of "Why not
X?" questions in software development :)

Sometimes it's not worth the hassle on either side. If this is added to the standard library and I write code that uses it, my code won't be backwards compatible with older versions of Python. So I'll either have to not support older Python versions or use an alternative implementation. If this is on pypi then that's not an issue. Not everything useful should be in the standard library.

If I had come across a need for this, I'd have just used unique_everseen(permuations(...)) until performance became an issue.

Ron Adam

unread,
Oct 17, 2013, 12:10:22 PM10/17/13
to python...@python.org


On 10/12/2013 02:09 AM, David Mertz wrote:
> On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar
> <miste...@gmail.com
> <mailto:miste...@gmail.com>> wrote:
>
> Why not just use the standard python way to generalize this: "key"
> rather than the nonstandard "filter_by".
>
>
> Yes, 'key' is a much better name than what I suggested.
>
> I'm not quite sure how best to implement this still. I guess MRAB's
> recursive approach should work, even though I like the simplicity of my
> style that takes full advantage of the existing itertools.permutations()
> (and uses 1/3 as many lines of--I think clearer--code). His has the
> advantage, however, that it doesn't require operator.lt
> <http://operator.lt>() to work... however, without benchmarking, I have a
> pretty strong feeling that my suggestion will be faster since it avoids all
> that recursive call overhead. Maybe I'm wrong about that though.


I'd like to see some nice examples of how it's used. It seems to me, There
is some mixing of combination/permutation concepts.


def filter_repeats(itr):
seen = set()
for i in itr:
if i in seen:
continue
seen.add(i)
yield i

def unique_combinations(itr, length=None):
if length == None:
length = len(itr)
return it.product(filter_repeats(itr), repeat=length)


This one isn't the same as yours, but it's an example of filter_repeats.
While that isn't the most efficient way in some cases, filtering repeat
items out of things is a fairly common problem.

Cheers,
Ron
Reply all
Reply to author
Forward
0 new messages