Issue 3501 in sympy: Missing partitions in sympy.utilities.iterables.multiset_partitions

78 views
Skip to first unread message

sy...@googlecode.com

unread,
Nov 11, 2012, 6:51:15 PM11/11/12
to sympy-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

multiset_partitions does not seem to be generating all possible partitions.

In particular, if I use this function to enumerate factorings of 672 it
does not find 2*2*8*21

Also, some of the partitions of [2,2,2,2,2,2] include the empty set as an
element.

I am aware of bug 2485 for multiset_partitions, but that seems to be
describing different behavior. (If this seems close enough, please go
ahead and merge this into 2485.)

Found this on 0.7.1, and it seems unchanged with a top-of-tree version I
just pulled with git. (Both tests on Linux, with 64 bit Python 2.7.3)

I am attaching a code snippet which shows the behavior.

Attachments:
sympy_multiset_bugs.py 791 bytes

sy...@googlecode.com

unread,
Nov 11, 2012, 6:57:55 PM11/11/12
to sympy-...@googlegroups.com
Updates:
Status: Valid
Labels: WrongResult Combinatorics

Comment #1 on issue 3501 by asme...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

(No comment was entered for this change.)

sy...@googlecode.com

unread,
Nov 12, 2012, 9:22:33 PM11/12/12
to sympy-...@googlegroups.com

Comment #2 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

This is very pertinent issue since I just committed kbins which depends on
this. I think this is different, because an empty set rather than None is
being returned, and as you point out, there are some partitions missing. We
probably just need to check the Knuth reference to see where we went wrong
in implementing this. That, or we might be able to file a bug report to
Knuth!

sy...@googlecode.com

unread,
Nov 13, 2012, 6:28:09 AM11/13/12
to sympy-...@googlegroups.com

Comment #3 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Does this look better?

[2, 2, 2, 84]
[2, 2, 3, 56]
[2, 2, 4, 42]
[2, 2, 4, 42]
[2, 2, 4, 42]
[2, 2, 6, 28]
[2, 2, 6, 28]
[2, 2, 6, 28]
[2, 2, 7, 24]
[2, 2, 8, 21]
[2, 2, 12, 14]
[2, 2, 12, 14]
[2, 2, 12, 14]
[2, 2, 168]
[2, 2, 168]
[2, 3, 4, 28]
[2, 3, 4, 28]
[2, 3, 4, 28]
[2, 3, 4, 28]
[2, 3, 8, 14]
[2, 3, 8, 14]
[2, 3, 112]
[2, 4, 4, 21]
[2, 4, 4, 21]
[2, 4, 6, 14]
[2, 4, 6, 14]
[2, 4, 6, 14]
[2, 4, 6, 14]
[2, 4, 6, 14]
[2, 4, 6, 14]
[2, 4, 7, 12]
[2, 4, 7, 12]
[2, 4, 7, 12]
[2, 4, 7, 12]
[2, 4, 84]
[2, 4, 84]
[2, 4, 84]
[2, 4, 84]
[2, 4, 84]
[2, 6, 7, 8]
[2, 6, 7, 8]
[2, 6, 56]
[2, 6, 56]
[2, 7, 48]
[2, 8, 42]
[2, 8, 42]
[2, 12, 28]
[2, 12, 28]
[2, 12, 28]
[2, 14, 24]
[2, 14, 24]
[2, 16, 21]
[2, 16, 21]
[2, 16, 21]
[3, 4, 4, 14]
[3, 4, 4, 14]
[3, 4, 56]
[3, 4, 56]
[3, 7, 32]
[3, 8, 28]
[3, 8, 28]
[4, 4, 6, 7]
[4, 4, 6, 7]
[4, 4, 42]
[4, 4, 42]
[4, 6, 28]
[4, 6, 28]
[4, 6, 28]
[4, 6, 28]
[4, 7, 24]
[4, 7, 24]
[4, 8, 21]
[4, 8, 21]
[4, 8, 21]
[4, 8, 21]
[4, 12, 14]
[4, 12, 14]
[4, 12, 14]
[6, 7, 16]
[6, 7, 16]
[6, 8, 14]
[6, 8, 14]
[7, 8, 12]
[7, 8, 12]
[14, 48]
[24, 28]


sy...@googlecode.com

unread,
Nov 13, 2012, 1:48:16 PM11/13/12
to sympy-...@googlegroups.com

Comment #4 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

The relevant Knuth reference is Algorithm M, (Multipartitions in decreasing
lexicographic order), page 429 of volume 4, TAOCP, section 7.2.1.5. While
it is of course *possible* that there is an error in Knuth's presentation
of the algorithm, that wouldn't be my first bet.

WRT the list of factorings in comment 3 -- yes, it looks like an
improvement, although my feel for multiset partitions is not good enough
that I can just eyeball it. It is still visiting the same factoring
multiple times, but that is a known issue.

It looks like Sage implements multiset partitions
(sage/combinat/subset.py). I'll see if I can use that to generate some
test cases.

sy...@googlecode.com

unread,
Nov 13, 2012, 2:12:03 PM11/13/12
to sympy-...@googlegroups.com

Comment #5 on issue 3501 by asme...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Is there a combinatorial formula to count these? That could also serve as
a check.

sy...@googlecode.com

unread,
Nov 14, 2012, 1:31:23 AM11/14/12
to sympy-...@googlegroups.com

Comment #6 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Oops - scratch the Sage reference I gave at the end of comment #4. The
class subset.py in sage enumerates the *subsets* of a multiset, not the
partitions.

sy...@googlecode.com

unread,
Nov 14, 2012, 2:27:30 AM11/14/12
to sympy-...@googlegroups.com

Comment #7 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Here is a failing test for the partitions of range(5) into 2 groups:


Should be

>>> for ai in a:
... print ai
...
((0, 1, 3, 4), (2,))
((0, 3), (1, 2, 4))
((0, 1), (2, 3, 4))
((0, 2), (1, 3, 4))
((0, 1, 2, 4), (3,))
((0, 1, 4), (2, 3))
((0, 2, 3), (1, 4))
((0, 2, 3, 4), (1,))
((0, 1, 2, 3), (4,))
((0, 1, 3), (2, 4))
((0, 4), (1, 2, 3))
((0, 1, 2), (3, 4))
((0, 3, 4), (1, 2))
((0, 2, 4), (1, 3))
((0,), (1, 2, 3, 4))


But gave (when sorted)

>>> for ai in b:
... print ai
...
((0, 1, 4), (2, 3))
((0, 1), (2, 3, 4))
((0, 2, 3), (1, 4))
((0, 2, 3, 4), (1,))
((0, 1, 2, 3), (4,))
((0, 2), (1, 3, 4))
((0, 4), (1, 2, 3))
((0, 1, 3, 4), (2,))
((0, 3, 4), (1, 2))
((0,), (1, 2, 3, 4))


It is missing these:

>>> for ai in a:
... if ai not in b: print ai
...
((0, 3), (1, 2, 4))
((0, 1, 2, 4), (3,))
((0, 1, 3), (2, 4))
((0, 1, 2), (3, 4))
((0, 2, 4), (1, 3))

My hunch is that it is an error made when dealing with 0-based or 1-based
lists. (And I, too, doubt that the error is in Knuth.)

sy...@googlecode.com

unread,
Nov 15, 2012, 5:58:33 AM11/15/12
to sympy-...@googlegroups.com

Comment #8 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Could you post a picture/scan of the pertinent page? I don't have access to
it here.

sy...@googlecode.com

unread,
Nov 15, 2012, 1:45:02 PM11/15/12
to sympy-...@googlegroups.com
Updates:
Labels: NeedsReview smichr

Comment #9 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

I added a different routine. Perhaps the Knuth routine will find its way
back, but for now there is a little gem of a routine that seems to work
well:

l = len(list(multiset_partitions(n, 4)))

n l time(s)
5 10 0.0
6 65 0.0
7 350 0.01
8 1701 0.03
9 7770 0.11
10 34105 0.62
11 145750 3.82
12 611501 18.28

I also made it generate only unique partitions if a true multiset (not set)
is given, e.g.

```
>>> list(multiset_partitions([1,1,1,2], 2))
[[[1, 1, 1], [2]], [[1, 1, 2], [1]], [[1, 1], [1, 2]]]
```

https://github.com/sympy/sympy/pull/1658


sy...@googlecode.com

unread,
Nov 15, 2012, 1:46:02 PM11/15/12
to sympy-...@googlegroups.com
Updates:
Labels: -Priority-Medium Priority-Critical

Comment #10 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

sy...@googlecode.com

unread,
Nov 15, 2012, 2:27:42 PM11/15/12
to sympy-...@googlegroups.com

Comment #11 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Re:posting 8.

I was going to point you to the oracle himself. Knuth still has the
fasicles on his web page. See <
www-cs-faculty.stanford.edu/~uno/fasc3b.ps.gz >.

The discussion of multiset partitions is on pages 38-40 of this file
(corresponding to pp 428-430 in the real book).

But ... there are some differences between the fascicle and the printed
version. First of all, algorithm M goes from being "simple" in the fasicle
to "fairly simple" in print :). More pertinently, he made some changes in
the code itself. Perhaps saptman (GSOC 11) was working from the fascile
and lead astray?

WRT post 9. Great! I will take a look, once I figure out how to use GIT a
bit better.

Meanwhile, for testing, here are some possibilities:

1) Multiset partitions are generalizations of integer partitions and set
partitions. As a sanity check we can verify that the multiset
implementation gives the same results on these simpler cases.

2) Multiset partitions can be implemented as a wrapper on top of set
partitions, and then filtering out the duplicates. I have attached a file
which does the wrapping, but not the filtering (yet). I can try it against
pull 1658 to verify that it gets the same answers.

3) Counting multiset partitions (comment 5) is possible, but I didn't see
an easy-to-understand formula or anything.

Thanks a lot for your attention on this bug.

Attachments:
slow_multiset_partitions.py 1.3 KB

sy...@googlecode.com

unread,
Nov 15, 2012, 3:00:29 PM11/15/12
to sympy-...@googlegroups.com

Comment #12 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

If 'ans' is the list of permutations that you want to make canonical, try

set( [tuple(sorted([tuple(sorted(j)) for j in i])) for i in ans] )

e.g.

>>> ans
[[[1, 1, 1, 2]], [[1, 1, 1], [2]], [[1, 1, 2], [1]], [[1, 1], [1, 2]], [[1,
1], [1], [2]], [[1, 1, 2], [1]], [[1, 1], [1, 2]], [[1, 1], [1], [2]], [[1,
2], [1, 1]], [[1], [1, 1, 2]], [[1], [1, 1], [2]], [[1, 2], [1], [1]],
[[1], [1, 2], [1]], [[1], [1], [1, 2]], [[1], [1], [1], [2]]]

>>> set( [tuple(sorted([tuple(sorted(j)) for j in i])) for i in ans] )
set([((1, 1, 1, 2),), ((1,), (1,), (1,), (2,)), ((1,), (1, 1), (2,)), ((1,
1), (1, 2)), ((1, 1, 1), (2,)), ((1,), (1, 1, 2)), ((1,), (1,), (1, 2))])
>>> for i in _:
... if len(i)==2: print i
...
((1, 1), (1, 2))
((1, 1, 1), (2,))
((1,), (1, 1, 2))

compare to new version:

>>> for i in multiset_partitions([1,1,1,2],2): print i
...
[[1, 1, 1], [2]]
[[1, 1, 2], [1]]

sy...@googlecode.com

unread,
Nov 15, 2012, 3:02:49 PM11/15/12
to sympy-...@googlegroups.com

Comment #13 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

btw, thanks for the reference to the Knuth fascicle. I used a different
source for the new algorithm:

http://www.math.upenn.edu/~wilf/website/CombAlgDownld.html

sy...@googlecode.com

unread,
Nov 16, 2012, 8:56:22 PM11/16/12
to sympy-...@googlegroups.com

Comment #14 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Thanks for your implementation of https://github.com/sympy/sympy/pull/1658,
and for pointing me to the Wilf reference.

I haven't done an exhaustive regression but it is providing the correct
answer on everything I have checked so far.

Using your implementation to enumerate all the possible factorings of
numbers up to 12000 takes 624 seconds on my laptop. (Well, not counting
the powers of primes. I had code which used integer partitions for that
special case.) This is a *lot* faster than the filter method I posted in
comment 11. Enumerating factoring of numbers up to 1000 takes 292 seconds
with my code, and only 2 seconds with yours. The large discrepancy is
surprising, since in each case we look at all the set partitions and
collapse them down as needed. If the performance difference is real, it
might be worth exposing the core of your code as a set partitions
implementation.

Some specific comments (and sorry, if I should be commenting on the pull
request itself, let me know).

1) You still have the second parameter, m, but the comment explaining what
it is for has gone away.

2) Integer partitions could be special cased to use
combinatorics.partitions. The difference is pretty dramatic. For example,
for n=14 there are 190 million set partitions, but only 135 integer
partitions.

3) Integer versus set partitions is the extreme case, but any time the
multiplicity of elements is high, a proper implementation of Knuth's
7.1.2.5m is going to be much faster than finding equivalence classes of set
partitions. (And to be fair, Knuth himself would not refer to 7.1.2.5m as
his algorithm. He traces it back to work by Wallis in 1685.)

4) After much procrastination, I think I now have my head around 7.1.2.5m.
I will see if I can produce an implementation of it, if only for my own
edification. One thing that might have to be sacrificed would be the
second parameter. Knuth provides, in an exercise, a version which produces
partitions of at most r parts, but not exactly r parts.

sy...@googlecode.com

unread,
Nov 16, 2012, 10:37:36 PM11/16/12
to sympy-...@googlegroups.com

Comment #15 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Can you explain what you mean by "exposing the core of your code as a set
partitions implementation"? I think I might get it but let me explain the
current situation:

iterables contains partitions and multiset_partitions. `partitions` is
actually a very fast integer partition algorithm. multiset_partition
handles both set and multiset partitions. So perhaps you are suggesting
that partitions should be renamed integer_partitions, I expose the core of
multiset_partitions as set_partitions and then use that in
multiset_partitions. Yes? Perhaps a better thing to do would be to make
partitions itself the doorway to all the functions. 1) if input is an
integer you get integer partitions; 2) if you pass a list (or set) you get
set partitions; 3) if you pass a dictionary you get multiset partitions.

> Some specific comments (and sorry, if I should be commenting on the pull
> request itself, let me know).


On the PR is a the preferred place, but the two refer to each other so this
is ok, too.

> 1) You still have the second parameter, m, but the comment explaining
> what it is for has gone away.

I'll add more description.

> 2) Integer partitions could be special cased to use
> combinatorics.partitions. The difference is pretty dramatic. For
> example, for n=14 there are 190 million set partitions, but only 135
> integer partitions.


I'm not sure this is necessary since Partitions only accepts set-like input:

```
>>> Partition([[1,2,3,1]])
Traceback (most recent call last):
...
ValueError: Partition contained duplicated elements.
```

> 3) Integer versus set partitions is the extreme case, but any time the
> multiplicity of elements is high, a proper implementation of Knuth's
> 7.1.2.5m is going to be much faster than finding equivalence classes of
> set partitions. (And to be fair, Knuth himself would not refer to
> 7.1.2.5m as his algorithm. He traces it back to work by Wallis in 1685.)


Agreed. If you can get the routine running again it would be great to see
it working. I think Knuth said something about the enduring quality of an
algorithm like "you just have to see it". Well the nextequ has that quality
for me...it's so simple!

> 4) After much procrastination, I think I now have my head around
> 7.1.2.5m. I will see if I can produce an implementation of it, if only
> for my own edification. One thing that might have to be sacrificed would
> be the second parameter. Knuth provides, in an exercise, a version which
> produces partitions of at most r parts, but not exactly r parts.


Great!

sy...@googlecode.com

unread,
Nov 17, 2012, 3:10:06 AM11/17/12
to sympy-...@googlegroups.com

Comment #16 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Sorry for any confusion. I am not using the terminology consistently and am
probably (still) a bit confused by the relationship betweeen the code in
utilities.iterables and combinatorics.partitions.

For the "expose the core of" remark, I didn't realize that yes, your
multiset_partitions code is a perfectly good implementation of set
partitions. It even has a provision for avoiding creation of the cache in
this case. BTW, I think there is a bug in this bit -- you set canon = None
(if no dups are found) but later test against canon == 0. Shouldn't one be
changed?

What I meant with my point number 2 is that if I feed multiset_partitions
input in which is just the same element repeated n times, the output is
isomorphic (except for the m parameter) to the partitions of the integer n.
(And in my earlier comment, I was confusing combinatorics and utilities
implementations of integer partitions.)

In code:

from sympy.utilities.iterables import partitions

for m in xrange(1, 6):
print [p for p in multiset_partitions('aaaaa', m)]

for p in partitions(5):
print p

Except that for large n, the second expression is using a much faster
algorithm.



sy...@googlecode.com

unread,
Nov 17, 2012, 4:35:57 AM11/17/12
to sympy-...@googlegroups.com

Comment #17 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Thanks for your clarifying comments. multiset_partitions is now a clearing
house for any enumerations of sets (with or without repeats). It uses the
partitions() routine if there is only 1 element in the multiset. I also
exposed (as a private method) the nexequ routine under the name
_set_partitions.

(Changing the routine to be 1-based was a delicate procedure! I left the
original version in case some bug is discovered in what I did.)

sy...@googlecode.com

unread,
Nov 17, 2012, 10:50:02 PM11/17/12
to sympy-...@googlegroups.com

Comment #18 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

I would like to have multiset working even if less than optimally. Is it ok
with you to commit what is there, pkrathmann2?

sy...@googlecode.com

unread,
Nov 18, 2012, 6:18:09 AM11/18/12
to sympy-...@googlegroups.com

Comment #19 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

btw, the old multiset code didn't look at the values of the multiset.
Unless a routine does, I don't know how anything could be faster than the
replacement code that I put it when you consider the simple output that it
is generating. Here is the output of _set_partitions for n=5


(1, [0, 0, 0, 0, 0])
(2, [0, 0, 0, 0, 1])
(2, [0, 0, 0, 1, 0])
(2, [0, 0, 0, 1, 1])
(3, [0, 0, 0, 1, 2])
(2, [0, 0, 1, 0, 0])
(2, [0, 0, 1, 0, 1])
(3, [0, 0, 1, 0, 2])
(2, [0, 0, 1, 1, 0])
(2, [0, 0, 1, 1, 1])
(3, [0, 0, 1, 1, 2])
(3, [0, 0, 1, 2, 0])
(3, [0, 0, 1, 2, 1])
(3, [0, 0, 1, 2, 2])
(4, [0, 0, 1, 2, 3])
(2, [0, 1, 0, 0, 0])
(2, [0, 1, 0, 0, 1])
(3, [0, 1, 0, 0, 2])
(2, [0, 1, 0, 1, 0])
(2, [0, 1, 0, 1, 1])
(3, [0, 1, 0, 1, 2])
(3, [0, 1, 0, 2, 0])
(3, [0, 1, 0, 2, 1])
(3, [0, 1, 0, 2, 2])
(4, [0, 1, 0, 2, 3])
(2, [0, 1, 1, 0, 0])
(2, [0, 1, 1, 0, 1])
(3, [0, 1, 1, 0, 2])
(2, [0, 1, 1, 1, 0])
(2, [0, 1, 1, 1, 1])
(3, [0, 1, 1, 1, 2])
(3, [0, 1, 1, 2, 0])
(3, [0, 1, 1, 2, 1])
(3, [0, 1, 1, 2, 2])
(4, [0, 1, 1, 2, 3])
(3, [0, 1, 2, 0, 0])
(3, [0, 1, 2, 0, 1])
(3, [0, 1, 2, 0, 2])
(4, [0, 1, 2, 0, 3])
(3, [0, 1, 2, 1, 0])
(3, [0, 1, 2, 1, 1])
(3, [0, 1, 2, 1, 2])
(4, [0, 1, 2, 1, 3])
(3, [0, 1, 2, 2, 0])
(3, [0, 1, 2, 2, 1])
(3, [0, 1, 2, 2, 2])
(4, [0, 1, 2, 2, 3])
(4, [0, 1, 2, 3, 0])
(4, [0, 1, 2, 3, 1])
(4, [0, 1, 2, 3, 2])
(4, [0, 1, 2, 3, 3])
(5, [0, 1, 2, 3, 4])

It's all very orderly and I suspect the 20+ lines of code that are
generating it are about as compact as you can get. So, again, unless any
replacement code actually considers the multiplicity of the elements I
think this is as good as we can get. The challenge for any code that
considers the multiplicity is to produce a pattern like the following:

>>> L=[1,1,1,2,2]
>>> s=set()
>>> for n, p in _set_partitions(len(L)):
... t = canon(p, L)
... if t not in s:
... print p
... s.add(t)
...
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 1]
[0, 0, 0, 1, 1]
[0, 0, 0, 1, 2]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 1]
[0, 0, 1, 0, 2]
[0, 0, 1, 1, 1]
[0, 0, 1, 1, 2]
[0, 0, 1, 2, 2]
[0, 0, 1, 2, 3]
[0, 1, 2, 0, 0]
[0, 1, 2, 0, 1]
[0, 1, 2, 0, 3]
[0, 1, 2, 3, 3]
[0, 1, 2, 3, 4]

where


def canon(q, m):
rv = [[] for i in range(max(q) + 1)]
for i, qi in enumerate(q):
rv[qi].append(m[i])
return tuple(sorted([tuple(i) for i in rv]))


sy...@googlecode.com

unread,
Nov 19, 2012, 3:18:45 AM11/19/12
to sympy-...@googlegroups.com

Comment #20 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Re comment 17 - Great! With a general dispatcher routine, if/when
7.2.1.5m working, can be later integrated, for those cases where it
provides a speedup.

re: comment 18 / checking in pull request 1658.

I agree that the existing code isn't functioning as anything better
than a (buggy) set partitions routine, since it doesn't look at its
elements. I have to admit that I have not been able to understand the
code in that is currently checked in -- is the original author
available to shed some light on the intent? It doesn't look anything
like algorithm M (the data structures are different), and it also
doesn't look like restricted growth strings, which is the main
algorithm Knuth presents for set partitions. So, I am puzzled.

At any rate, the choice between broken and mysterious and working and
understood is pretty clearcut, so checking this in will be a big
improvement.

I should have some time tomorrow to review your code more carefully.
(Of course, any feedback I provide comes with the proviso that I am a
complete newbie wrt sympy.)

One thing I noticed when reading it and trying examples -- if the
input multiset has no dups, you carefully set canon to None. But, in
this case, can't you also skip creating canonical, and checking the
cache?

As a test, I switched the code at the tail end of multiset_partitions to

if canon:
canonical = tuple(
sorted([tuple([canon[i] for i in j]) for j in rv]))
if canonical not in cache:
cache.add(canonical)
yield [[multiset[j] for j in i] for i in rv]
else:
yield [[multiset[j] for j in i] for i in rv]

and noticed a pleasant speedup. (624 to 550 seconds on my factorings
to 12000 code.) Of course, I am not claiming that this is *the* fix,
but you get the idea.


sy...@googlecode.com

unread,
Nov 20, 2012, 4:08:48 PM11/20/12
to sympy-...@googlegroups.com

Comment #21 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Hello,

I have had a chance to look at the pull request, and perform some
testing. (I see that you are adding more commits, so this is only
up-to-date as of 2c11617.)

Thanks for the commit which responds to comment 20 (and incorporates the
optimization).

First, since I am new to sympy and git, here is what I did to get
access to your changed files:

git remote add smichr git://github.com/smichr/sympy.git
git fetch smichr
git checkout -b kbins smichr/kbins

If this is wrong, that might explain inconsistencies in the comments
below (and please let me know, so I can fix my process).

Overall, the code looks very clean.

I paid particular attention to _set_partitions. I agree that this
implementation is about as good as possible for set
partitions in the m=None case. After all, it is amortized constant
time per partition vector returned, and it is hard to do better than
constant time! As used in multiset_partitions, in the the m=<number>
case, there may well be a faster approach than generating all the
patitions and filtering to get only those of the desired size. But, I
am certainly not suggesting we try for that case -- this code is
already quite elaborate, and a huge improvement over what is currently
in master.

The code for _set_partitions is short, and well commented, but the the
algorithm (and especially the data structure) is going to be
non-obvious to someone looking at it for the first time.

You *might* consider adding something like the following to the
comments/doc:

This algorithm is similar to, and solves the same problem as,
Algorithm 7.2.1.5H, from volume 4A of Knuth's The Art of Computer
Programming. Knuth uses the term "restricted growth string" where
this code refers a "partition vector". In each case, the meaning is
the same: the value in the ith element of the vector specifies to
which part the ith set element is to be assigned.

At the lowest level, this code implements an n-digit big-endian
counter (stored in the array q) which is incremented (with carries) to
get the next partition in the sequence. A special twist is that a
digit is constrained to be at most one greater than the maximum of all
the digits to the left of it. The array b maintains this maximum, so
that the code can efficiently decide when a digit can be incremented
in place or whether it needs to be reset to 0 and trigger a carry to
the next digit. The enumeration starts with all the digits 0 (which
corresponds to all the set elements being assigned to the same 0th
part), and ends with 0123...n, which corresponds to each set element
being assigned to a different, singleton, part.

---------------------------

The helper function _partition provides similar, but lighter-weight
functionality to that provided by the classmethod
combinatorics.partitions.Partition.from_rgs(). Perhaps worth a
cross-reference?

I also wrote a tester to sanity check. I am attaching it to this
post. Perhaps something like test_partitions() is worth incorporating
into the tests? The one-off tests at the bottom of this most
likely duplicate tests you already have.

-----------------
multiset_partitions

--> docstring

The analytical discussion under "Counting"

These formulas only apply in the set case. Multisets are more
complicated... perhaps:

If the input is a set (no repeated elements), the number of partitions
returned is given by the bell number ...

----------------

I looked briefly at the new multiset_combinations and
multiset_permutations and didn't see anything to argue with. (Except
of course the TODO comment in multiset_combinations).

Misc - combinatorics.RGS_enum() seems to be computing the same thing
as bell()? Worth a cleanup or cross reference?


Attachments:
tester_partitions.py 1.7 KB

sy...@googlecode.com

unread,
Nov 20, 2012, 4:30:14 PM11/20/12
to sympy-...@googlegroups.com

Comment #22 on issue 3501 by asme...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Re your Git workflow: I don't know if git checkout sets up the branch to
track by default. See if git pull updates the branch.

Usually, I just do "git checkout smichr/branch", which doesn't create a
local branch. To update with that model, just do git fetch and git checkout
again.

sy...@googlecode.com

unread,
Nov 20, 2012, 5:54:04 PM11/20/12
to sympy-...@googlegroups.com

Comment #23 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

> git checkout -b kbins smichr/kbins


That's what I would do...looks good.


> You *might* consider adding something like the following to the
> comments/doc:


I added the excellent comments (with very minor modification) to the notes
of `_set_partitions`.


> The helper function _partition provides similar, but lighter-weight
> functionality to that provided by the classmethod
> combinatorics.partitions.**Partition.from_rgs(). Perhaps worth a
> cross-reference?

> I added it as a See Also


> I also wrote a tester to sanity check. I am attaching it to this
> post. Perhaps something like test_partitions() is worth incorporating
> into the tests? The one-off tests at the bottom of this most
> likely duplicate tests you already have.

> -----------------
> multiset_partitions

> --> docstring

> The analytical discussion under "Counting"

> These formulas only apply in the set case. Multisets are more
> complicated... perhaps:


done


> If the input is a set (no repeated elements), the number of partitions
> returned is given by the bell number ...

> ----------------

> I looked briefly at the new multiset_combinations and
> multiset_permutations and didn't see anything to argue with. (Except
> of course the TODO comment in multiset_combinations).


The TODO is done, too, and now my brain is mush.


> Misc - combinatorics.RGS_enum() seems to be computing the same thing
> as bell()? Worth a cleanup or cross reference?


> Other than the calculation for numbers < 1, it appears to be the same and
much faster, so I imported and used bell there:

>>> RGS_enum(16)
10480142147
>>> bell(16)
10480142147
>>> from timeit import timeit
>>> timeit('RGS_enum(23)','from sympy.combinatorics import *', number=100)
0.46608495712280273
>>> timeit('RGS_enum(23)','from sympy import bell as RGS_enum', number=100)

sy...@googlecode.com

unread,
Nov 20, 2012, 6:45:03 PM11/20/12
to sympy-...@googlegroups.com

Comment #24 on issue 3501 by asme...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Your last line got cut off there.

sy...@googlecode.com

unread,
Nov 20, 2012, 7:42:44 PM11/20/12
to sympy-...@googlegroups.com

Comment #25 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

sorry...that was


>>> timeit('RGS_enum(23)','from sympy import bell as RGS_enum', number=100)
0.0007009506225585938


sy...@googlecode.com

unread,
Nov 20, 2012, 8:47:56 PM11/20/12
to sympy-...@googlegroups.com

Comment #26 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Great! This looks good to commit. (With caveat that I am just
getting my feet wet with Sympy.)

Nits (which I don't need to see again)

iterables.py:
line 992:
This routine was rewritten form the reference to use 0-based lists.
form ->from

line 1181: Paramaters->Parameters


sy...@googlecode.com

unread,
Nov 20, 2012, 9:30:46 PM11/20/12
to sympy-...@googlegroups.com

Comment #27 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

OK, that's done now, too. Your feedback has been very helpful. Thank you.
Now if I can figure out why some of the tests are failing this will be
ready to go.

sy...@googlecode.com

unread,
Nov 21, 2012, 10:18:11 AM11/21/12
to sympy-...@googlegroups.com
Updates:
Status: Fixed

Comment #28 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

It's in...all partitions are now generated.


sy...@googlecode.com

unread,
Nov 22, 2012, 2:35:32 AM11/22/12
to sympy-...@googlegroups.com

Comment #29 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

pkrathmann2: is there a chance you could look over PR
https://github.com/sympy/sympy/pull/1666 ? It fixes (with pretty simple
code) 3 more routines.

sy...@googlecode.com

unread,
Nov 23, 2012, 6:33:47 PM11/23/12
to sympy-...@googlegroups.com

Comment #30 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

> It's in...all partitions are now generated.
Awesome. Congratulations, this has been fun.

> ... look over PR https://github.com/sympy/sympy/pull/1666

Sure I will take a look -- I see you have committed this already, but
better late than never, (I wasn't paying attention over Thanksgiving, and
probably won't have much time until Monday.)

I am still working on the 7.2.1.5m implementation. That would be nice to
get in as well.

sy...@googlecode.com

unread,
Nov 24, 2012, 12:51:45 AM11/24/12
to sympy-...@googlegroups.com
Updates:
Status: Valid

Comment #31 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Good news: round 2! If you could carefully look over
https://github.com/sympy/sympy/pull/1671 that would be great. Since I've
increased coverage to essentially 100% (of coverable lines other than
interactive traversal and errors) then checking to see that the tests look
right is a good first step.

Perhaps your reading of recursive code will be better than mine to know if
I'm always doing the right thing in the multiset-* routines.

sy...@googlecode.com

unread,
Nov 29, 2012, 4:30:42 AM11/29/12
to sympy-...@googlegroups.com
Updates:
Status: Fixed

Comment #32 on issue 3501 by smi...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

sy...@googlecode.com

unread,
Dec 4, 2012, 3:54:48 PM12/4/12
to sympy-...@googlegroups.com

Comment #33 on issue 3501 by pkrathma...@gmail.com: Missing partitions in
sympy.utilities.iterables.multiset_partitions
http://code.google.com/p/sympy/issues/detail?id=3501

Entered comments (on the github pages) for pull requests 1671 and 1666.

Reply all
Reply to author
Forward
0 new messages