exact cover problem

boo...@u.washington.edu

unread,

Feb 22, 2008, 2:55:52 PM2/22/08

to sage-devel

I've found a nice implementation of the DLX algorithm, which "quickly" solves the NP-Complete exact cover problem. For those who aren't in Seattle and haven't heard me blathering on and on and on and on about how awesome DLX is...

Let M be a binary matrix. An exact cover is a subset of the rows of the matrix who sum (exactly) to the vector 1,1,...,1.

For example, an exact cover of

1010
1100
1011
0101
0111

is

1010
0101

Just now, I've completed wrapping Antti Ajanki's implementation of DLX so that one can easily compute the exact cover of a Sage matrix, and I've constructed a mapping from the graph coloring problem into the exact cover problem. Further, I've applied that to compute the chromatic number and chromatic polynomial of graphs. I think there is sufficient interest in this (ref #1313, #1311).

Arguments for including Ajanki's code:
1) It's the only Python implementation of DLX I've seen.
2) I emailed the author, who happily added the "or later" bit after the GPL2
3) With my Sage Matrix -> DLXMatrix code (plus docstrings to everything I added), the file dlx.py is only 8kB!
4) It will resolve tickets #1311 and #1313

So - I'm thinking this should be added to the combinat directory (of course, the graph-oriented stuff will go into graphs). I'll file a ticket & some patches today if nobody objects.

Jason Grout

unread,

Feb 22, 2008, 3:04:58 PM2/22/08

to sage-...@googlegroups.com

*Wow*

I'd be really happy with getting chromatic number and chromatic
polynomial into the graph theory stuff. I've been sort of embarrassed
we didn't have it :)

Jason

William Stein

unread,

Feb 22, 2008, 3:49:41 PM2/22/08

to sage-...@googlegroups.com, Gregory Bard

On Fri, Feb 22, 2008 at 12:04 PM, Jason Grout
<jason...@creativetrax.com> wrote:
>
> boo...@u.washington.edu wrote:
> > I've found a nice implementation of the DLX algorithm, which "quickly" solves the NP-Complete exact cover problem. For those who aren't in Seattle and haven't heard me blathering on and on and on and on about how awesome DLX is...
> >
> > Let M be a binary matrix. An exact cover is a subset of the rows of the matrix who sum (exactly) to the vector 1,1,...,1.
> >

Isn't that exactly the same thing as solving

M*x = [1,...,1],

over GF(2)? I.e., the cost should be the same as 1
(or maybe 2) reduced echelon forms over GF(2)?
If so, is "your nice implementation of the DLX algorithm"
better than Gregory Bard's existing implementation
of computation of reduced row echelon form of
matrices over GF(2)?

If the number of bits in your rows is 1000 how long does
it take? On sage.math Gregory Bard's code computes
the rref of a 1000x1000 random matrix over GF(2) in 0.03 seconds:

sage: a = random_matrix(GF(2),1000)
sage: time a.echelonize()
CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s
Wall time: 0.03

A 4000x4000 takes 1.46 seconds:

sage: a = random_matrix(GF(2),4000)
sage: time a.echelonize()
CPU times: user 1.46 s, sys: 0.00 s, total: 1.46 s
Wall time: 1.46

Here's an example of this relating to your example:

sage: a = matrix(GF(2),4,[1,0,1,0, 1,1,0,0, 1,0,1,1, 0,1,0,1])
sage: b = matrix(GF(2), a.nrows(), [1]*a.nrows())
sage: a.solve_right(b)

[0]
[1]
[1]
[0]

That solve_right uses rref behind the scenes.

--

I'm naive and just curious? And I also realize that
there is probably a big difference between the dense
and sparse cases (also for graphs).

-- William

> > For example, an exact cover of
> >
> > 1010
> > 1100
> > 1011
> > 0101
> > 0111
> >
> > is
> >
> > 1010
> > 0101
> >
> > Just now, I've completed wrapping Antti Ajanki's implementation of DLX so that one can easily compute the exact cover of a Sage matrix, and I've constructed a mapping from the graph coloring problem into the exact cover problem. Further, I've applied that to compute the chromatic number and chromatic polynomial of graphs. I think there is sufficient interest in this (ref #1313, #1311).
> >
> > Arguments for including Ajanki's code:
> > 1) It's the only Python implementation of DLX I've seen.
> > 2) I emailed the author, who happily added the "or later" bit after the GPL2
> > 3) With my Sage Matrix -> DLXMatrix code (plus docstrings to everything I added), the file dlx.py is only 8kB!
> > 4) It will resolve tickets #1311 and #1313
> >
> > So - I'm thinking this should be added to the combinat directory (of course, the graph-oriented stuff will go into graphs). I'll file a ticket & some patches today if nobody objects.
> >
>
> *Wow*
>
> I'd be really happy with getting chromatic number and chromatic
> polynomial into the graph theory stuff. I've been sort of embarrassed
> we didn't have it :)
>
> Jason
>
>
>
>
> >
>

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

David Harvey

unread,

Feb 22, 2008, 3:50:57 PM2/22/08

to sage-...@googlegroups.com

On Feb 22, 2008, at 3:49 PM, William Stein wrote:

>
> On Fri, Feb 22, 2008 at 12:04 PM, Jason Grout
> <jason...@creativetrax.com> wrote:
>>
>> boo...@u.washington.edu wrote:
>>> I've found a nice implementation of the DLX algorithm, which
>>> "quickly" solves the NP-Complete exact cover problem. For those
>>> who aren't in Seattle and haven't heard me blathering on and on
>>> and on and on about how awesome DLX is...
>>>
>>> Let M be a binary matrix. An exact cover is a subset of the rows
>>> of the matrix who sum (exactly) to the vector 1,1,...,1.
>>>
>
> Isn't that exactly the same thing as solving
>
> M*x = [1,...,1],
>
> over GF(2)?

I think it's probably not mod 2.

david

William Stein

unread,

Feb 22, 2008, 3:56:37 PM2/22/08

to sage-...@googlegroups.com

Oh, that's a good point. I guess things become a lot easier when 1+1 = 0.

William

John Cremona

unread,

Feb 22, 2008, 4:10:20 PM2/22/08

to sage-...@googlegroups.com

The original problem said "binary matrix" so surely that means mod 2?
And I would expect mod 2 matrices to come up in the graph theory
applications. Not that I know about that...

John

--
John Cremona

David Roe

unread,

Feb 22, 2008, 4:14:22 PM2/22/08

to sage-...@googlegroups.com

In this context I think that binary means all the entries are 1s and zeros. But when you look for a set of rows that add up to [1,1,1,...], you don't consider 1+1=0. This makes sense when you want only one 1 to appear in each column, which is a natural requirement, and makes the problem much harder. In particular, you don't get to use linear algebra because your arithmetic isn't taking place over a field.
David

Robert Miller

unread,

Feb 22, 2008, 4:32:08 PM2/22/08

to sage-devel

Just a technical note: Mod 2 matrices are not the natural way to think
about adjacency matrices (I learned this the hard way) - the entry is
actually better thought of as the number of paths of length one from
one vertex to another. That way taking nth powers of the matrices
counts the number of n-paths from one vertex to another.

Let's not try to reduce NP-complete to polynomial in this thread... ;)

Sorry, my point is an emphatic ++1 for including DLX in Sage.

-- Robert M

Bill Hart

unread,

Feb 22, 2008, 6:33:33 PM2/22/08

to sage-devel

If it is an NP-complete problem, presumably it asks whether such a set
of rows exists, not that you find one.

Bill.

Carlo Hamalainen

unread,

Feb 23, 2008, 2:18:31 PM2/23/08

to sage-...@googlegroups.com

On Fri, Feb 22, 2008 at 8:55 PM, <boo...@u.washington.edu> wrote:
> Arguments for including Ajanki's code:
> 1) It's the only Python implementation of DLX I've seen.
> 2) I emailed the author, who happily added the "or later" bit after the GPL2
> 3) With my Sage Matrix -> DLXMatrix code (plus docstrings to everything I added), the file dlx.py is only 8kB!
> 4) It will resolve tickets #1311 and #1313

This afternoon I wrote some code to get completions of partial latin
squares using Ajanki's DLX code (this is basically the same as solving
a Sudoku puzzle, minus a few constraints because latin squares don't
have the 3x3 blocks).

I discovered that the main function, dosearch(), uses recursion, and
Python's recursion limit is reached quite easily (e.g. finding a
completion of a 16x16 partial latin square blows up). Knuth's original
implementation in C avoided recursion by using an array to simulate a
stack, and looping uses goto statements, and so it can be run on quite
large problem instances. As a side benefit there is no function call
overhead in the main part of the algorithm.

I have been using my own implementation of DLX in C++ for a while now
and I've found it very useful, so I'm all for the algorithm getting
into Sage. I can see a few ways forward:

(1) Rewrite Ajanki's code to avoid the recursion limit. The problem
here is that Python doesn't have 'goto'. Can the recursion limit be
dealt with in some other way?

(2) Write a wrapper for an implementation in C++. I'm happy to make my
code GPL and I'm also happy to write a wrapper once I get my head
around cython/pyrex.

(3) Write the algorithm from scratch in Cython. Does Cython have goto?

Cheers,

--
Carlo Hamalainen
http://carlo-hamalainen.net

boo...@u.washington.edu

unread,

Feb 23, 2008, 3:46:00 PM2/23/08

to sage-...@googlegroups.com

On Sat, 23 Feb 2008, Carlo Hamalainen wrote:

>
> On Fri, Feb 22, 2008 at 8:55 PM, <boo...@u.washington.edu> wrote:
>> Arguments for including Ajanki's code:
>> 1) It's the only Python implementation of DLX I've seen.
>> 2) I emailed the author, who happily added the "or later" bit after the GPL2
>> 3) With my Sage Matrix -> DLXMatrix code (plus docstrings to everything I added), the file dlx.py is only 8kB!
>> 4) It will resolve tickets #1311 and #1313
>
> This afternoon I wrote some code to get completions of partial latin
> squares using Ajanki's DLX code (this is basically the same as solving
> a Sudoku puzzle, minus a few constraints because latin squares don't
> have the 3x3 blocks).
>
> I discovered that the main function, dosearch(), uses recursion, and
> Python's recursion limit is reached quite easily (e.g. finding a
> completion of a 16x16 partial latin square blows up). Knuth's original
> implementation in C avoided recursion by using an array to simulate a
> stack, and looping uses goto statements, and so it can be run on quite
> large problem instances. As a side benefit there is no function call
> overhead in the main part of the algorithm.
>
> I have been using my own implementation of DLX in C++ for a while now
> and I've found it very useful, so I'm all for the algorithm getting
> into Sage. I can see a few ways forward:
>
> (1) Rewrite Ajanki's code to avoid the recursion limit. The problem
> here is that Python doesn't have 'goto'. Can the recursion limit be
> dealt with in some other way?

Yes, I've devoted some thought to this -- we can remove the recursion by using a stack. This shouldn't be too hard. But why do you want a "goto"? That's like duct-taping your car door shut when the latch works fine.

> (2) Write a wrapper for an implementation in C++. I'm happy to make my
> code GPL and I'm also happy to write a wrapper once I get my head
> around cython/pyrex.

Sweet. I wanted Ajanki's code because it worked out of the box. I modified it a little so it'd yield solutions rather than using a callback function. Faster is better.

> (3) Write the algorithm from scratch in Cython. Does Cython have goto?

I'm sitting across from Robert Bradshaw. He said "Gotos are ugly. Well, gotos are ugly unless you're writing a compiler, and then you need them". So no, no gotos in Cython.

DLX should be absolutely trivial to implement using a stack instead of recursion. That should give a huge boost, and there will be no need for gotos.

William Stein

unread,

Feb 23, 2008, 4:58:04 PM2/23/08

to sage-...@googlegroups.com

Here's an example of using an "evil trick" Tom just suggested
to me to make a goto directly in Cython. NEVER REALLY
DO THIS! This is for your amusement only.

cdef extern from "stdio.h":
void goto "goto out; //"()
void receive "out: //"()

print "1"
goto()
print "2"
receive()
print "3"

########################

This outputs:

1
3

boo...@u.washington.edu

unread,

Feb 25, 2008, 12:20:17 AM2/25/08

to sage-...@googlegroups.com

Dirty, William. I can't believe you blame this on me -- that was all Robert's fault. Anyway. I've co-opted Ajanki's framework, and have rewritten the core of the search algorithm to be iterative. And, without a single goto! I'm going to test this some more, and update my patch for #2271 in a few hours. Seems to work flawlessly, and I can color K_22 in less than 8 seconds (not impressive, no, except that previously, attempting to color K_14 exceeded recursion limit).

boo...@u.washington.edu

unread,

Feb 25, 2008, 5:07:35 AM2/25/08

to sage-...@googlegroups.com, Carlo Hamalainen

On Mon, 25 Feb 2008, Carlo Hamalainen wrote:

> On Mon, Feb 25, 2008 at 6:20 AM, <boo...@u.washington.edu> wrote:
>> Dirty, William. I can't believe you blame this on me -- that was all Robert's fault. Anyway. I've co-opted Ajanki's framework, and have rewritten the core of the search algorithm to be iterative. And, without a single goto! I'm going to test this some more, and update my patch for #2271 in a few hours. Seems to work flawlessly, and I can color K_22 in less than 8 seconds (not impressive, no, except that previously, attempting to color K_14 exceeded recursion limit).
>

> Are you still keen for me to write a wrapper for my C++ code? I got
> most of the way on the weekend...

While I put a fair amount of work into making the search iterative, my pride doesn't get in the way -- if your code is better, then it goes in.

What is "better" is a bit debatable: faster is a big portion, but quality of code and documentation is big, too... and these days, documentation is getting more attemtion than code. As such, I've put as much effort into documenting Ajanki's code as I have into making it work better. Also, a side effect of my effort is that the code is now ready to be rewritten in Cython, so easy to maintain.

Your code is probably a million times faster than mine (if we convert Ajanki's code to Cython, yours will probably remain faster, but less so). How's the documentation? Doctests?

Here's what I've got so far. I'm inclined to say, I should finish documenting this file (2 or three more routines needed) so we can include this in the next release. Or, if your code is all documented & ready to go, we can go with that; no skin off my back.

http://sage.math.washington.edu/dlx.py

mabshoff

unread,

Feb 25, 2008, 11:37:24 AM2/25/08

to sage-devel

On Feb 25, 11:07 am, boot...@u.washington.edu wrote:
> On Mon, 25 Feb 2008, Carlo Hamalainen wrote:

I would also suggest to go that way since we can then merge the ticket
dependent on it. Once we have the correctly, but not blazingly fast
version in Sage we can always switch to the C++ version as it is
convenient for the integrators. Please post a patch with the
documented code once it is ready, so we can merge it for
2.10.3.alpha0.

Cheers,

Michael

Nick Alexander

unread,

Feb 25, 2008, 1:08:04 PM2/25/08

to sage-...@googlegroups.com

> I would also suggest to go that way since we can then merge the ticket
> dependent on it. Once we have the correctly, but not blazingly fast
> version in Sage we can always switch to the C++ version as it is
> convenient for the integrators.

+1 -- all those lovely doctests will not go to waste :)

Nick

Carlo Hamalainen

unread,

Mar 1, 2008, 10:53:25 AM3/1/08

to sage-...@googlegroups.com

I've got my C++ solver working in Sage. Here's a short blog post about it:

http://carlo-hamalainen.net/blog/?p=18

The code itself is here: http://carlo-hamalainen.net/sage/latin-1.1/

The relevant files are dancing_links.cpp, dancing_links.sage,
dancing_links.spyx

Reply all

Reply to author

Forward