Get canonical form of binary matrix quickly

James Dow Allen

unread,

Dec 16, 2010, 12:56:07 AM12/16/10

to

For certain types of computer program one wants to
reduce a search space by detecting equivalences.
For example, suppose we want to analyze a game
involving 5-by-4 grids, in which permutation of
rows and/or permutations of columns have no effect
on the analysis; i.e. where these two grids are equivalent:

a b c d e
---------

X - - - - | 1
- X X X - | 2
- - - - - | 3
- X - - - | 4

e c a d b
---------

- X - X X | 2
- - - - - | 3
- - - - X | 4
- - X - - | 1

(I've shown the required permutations explicitly
by labeling rows and columns.)

There are 2^20 > 1 million possible 5-by-4 binary
matrices, but only 1053 equivalence classes.
(See, for example,
http://www.computer.org/portal/web/csdl/doi/10.1109/T-C.1973.223649
)
Obviously a computer program of the type I mention
will be much faster if it has a way to convert
a matrix to the canonical form of its equivalence
class.

This can be done in brute force by trying every one
of the 5! * 4! permutations and remembering the one
with a smallest ordering statistic, but that's slow.

Is there a fast way?

One very simple idea to reduce such a matrix to a
"canonical" form is
sort the columns in "obvious" way
sort the rows in "obvious" way
This leaves 3752 equivalence classes, much less than
1 million, and perhaps good enough in practice, but
it isn't 1053.

(I use 5-by-4 matrix as example. Larger matrices
would be even more important to handle, but to
repeat the experiments described here would be
time-consuming.)

If instead we do
sort the columns, major bit-count, minor "obvious"
sort the rows in "obvious" way
we get only 1796 classes.
Playing around almost randomly with some variations
I found a ten-step sort that leaves only 1055 classes.
Tantalizingly close to 1053; perhaps hinting that
the problem can be solved with O(n+m) sorts.

So, is this a studied problem and is there an answer?

James Dow Allen

Chip Eastham

unread,

Dec 16, 2010, 7:42:42 PM12/16/10

to

I assume from the phrase "binary matrix" in the
subject line that these are {0,1} matrices.

The problem is interesting: what is the optimal
way to exploit the symmetries of row/column
permutation to obtain unique equivalence class
representatives. As you note, a canonical way
to define class representatives is by lexico-
graphic ordering of the arrays.

I've been using such an approach to sudoku
matrices recently, and the idea of using a
lexicographic ordering to reduce the search
space has been described in the literature
on latin square and sudoku countings.

For binary matrices it seems useful to at
least impose the restriction that the first
row be least, ie. of the form --...X or
zeroes first and trailing 1's. There can
be more than one candidate for such rows,
and the choice of such a first row gives a
fairly neat restriction on any further
reordering of the columns (permuting those
with a zero in the first row separately
from those with a 1 in the first row).

Anyway it's an interesting problem and
I'll keep an eye open for references in
the literature.

regards, chip

Chip Eastham

unread,

Dec 26, 2010, 12:41:01 PM12/26/10

to

On Dec 16, 12:56 am, James Dow Allen <jdallen2...@yahoo.com> wrote:

It might be of relevance to consider the restricted
row and column sums, since apart from order these
are preserved by permutations.

Here is a quite recent paper by Alexander Barvinok:

[Matrices with Prescribed Row and Column Sums]
http://www.math.lsa.umich.edu/~barvinok/linalg.pdf

which discusses counting or estimating the number
of solutions for both binary ({0,1}) matrices and for
nonnegative integer matrices.

Unless I'm missing something important, we can
restrict attention to the permutation of rows (resp.
columns) that have equal sums.

regards, chip

Chip Eastham

unread,

Jan 6, 2011, 8:38:10 AM1/6/11

to

Some further thoughts:

It seems that these can be considered graph
isomorphism problems involving bipartite graphs
(with 2-colored nodes). So after considering
the degrees of nodes (which correspond to row
and column sums), the sizes of components
would be another good discriminant/invariant.
The binary matrix is the incidence matrix for
the the two subsets of nodes.

Some others discuss "permutation equivalence"
of matrices in connection with transposing
the matrix when it is square. This also has
a natural interpretation for swapping the
two subsets of nodes.

regards, chip

Chip Eastham

unread,

Jan 6, 2011, 10:19:18 PM1/6/11

to

Here's a paper in which the correspondance of
the graph isomorphism problem for (simple,
undirected) bipartite graphs and permutational
equivalence of (general real) matrices is
noted (compare Thm. 4.4 and 4.5):

http://www.math.uic.edu/~friedlan/graphistenpr9.1.08.pdf

So, at the least we can be sure that this is a problem
that has been studied!

regards, chip

Chip Eastham

unread,

Jan 6, 2011, 10:59:37 PM1/6/11

to

This 2009 blog post claims the "restricted bipartite graph
isomorphism" problem (RBGI) is complete relative to
graph isomorphism (GI) problems (e.g. any of the latter
are reducible to the former in polynomial time). It is
unknown whether GI is in P:

http://rjlipton.wordpress.com/2009/05/04/an-approach-to-graph-isomorphism/

Here RBGI deals precisely with the isomorphism
between bipartite graphs on two-colored node sets,
subject to equal cardinality of the colored sets (an
inessential requirement it seems, since degree 0
nodes could be added for balance) and that the
isomorphism preserves color (avoiding "transpose"
of the incidence matrix).

The blog author explicitly prefers the matrix
formulation (permutation equivalence) of RBGI.

regards, chip

Chip Eastham

unread,

Jan 7, 2011, 10:40:13 AM1/7/11

to

> http://rjlipton.wordpress.com/2009/05/04/an-approach-to-graph-isomorp...

>
> Here RBGI deals precisely with the isomorphism
> between bipartite graphs on two-colored node sets,
> subject to equal cardinality of the colored sets (an
> inessential requirement it seems, since degree 0
> nodes could be added for balance) and that the
> isomorphism preserves color (avoiding "transpose"
> of the incidence matrix).
>
> The blog author explicitly prefers the matrix
> formulation (permutation equivalence) of RBGI.
>
> regards, chip

Perhaps I should retract the glib claim that equal
cardinality of the node sets is "inessential". In
fact we can "balance" them by adding degree 0 nodes,
but this confounds the polynomial-time analysis by
inflating the size of the input (extra nodes and
non-edges).

--c

James Dow Allen

unread,

Jan 8, 2011, 6:06:02 PM1/8/11

to

On Jan 7, 10:59 am, Chip Eastham <hardm...@gmail.com> wrote:
>
> This 2009 blog post claims the "restricted bipartite graph
> isomorphism" problem (RBGI) is complete relative to
> graph isomorphism (GI) problems (e.g. any of the latter
> are reducible to the former in polynomial time). It is
> unknown whether GI is in P:
>

> http://rjlipton.wordpress.com/2009/05/04/an-approach-to-graph-isomorp...

I just saw this, skimmed the articles and your post,
and conclude that this all means the answer to my question
is "Probably not; or if possible it will be a NEW result."
Does that sound correct?

On Dec 16 2010, 12:56 pm, James Dow Allen <jdallen2...@yahoo.com>
wrote:

> Playing around almost randomly with some variations
> I found a ten-step sort that leaves only 1055 classes.
> Tantalizingly close to 1053; perhaps hinting that
> the problem can be solved with O(n+m) sorts.

So the implied conjecture here is either false or would
be an "important" new result. The former case is much more
likely, but even so, developing that sort procedure might
have value: for the target applications, missing some equivalences is
OK.

Thanks for researching this, Chip.

James Dow Allen

Chip Eastham

unread,

Jan 9, 2011, 8:11:23 AM1/9/11

to

Right, the blog article implies your conjecture, if
proven, would be an important new result. Actually
I'd guess if the polynomiality was disproven that
would be just as important...

It's only a blog post, and no references given, so
I'm a little skeptical of it on the fine points. But
the comments the post attracted were unusually
detailed and thoughtful. Worth a second look,
perhaps, if you overlooked them (below the fold).

--c

James Dow Allen

unread,

Jan 9, 2011, 8:45:13 AM1/9/11

to

On Jan 9, 8:11 pm, Chip Eastham <hardm...@gmail.com> wrote:
> Right, the blog article implies your conjecture, if

> proven, would be an important new result....

Color me pessimistic, but your report is enough
to make me disavow the conjecture. :-)

However, in a strange way, if *no* polynomial-time routine
is available to do perfect canonicalization, it makes
a pseudo-classification heuristic routine more fun!
Recall that "pseudo-classes" are *good enough*;
after all, the whole equivalencing is used only
to save on computation costs.

(Although I might have needed this function in
the past, my only present purpose is for an
example routine for my website, which contains
a few programming tips and code examples.
I'd left the "project" on the back-burner until
Chip responded, but now I've done some experiments,
reported below, and will soon post some code at
my website.)

The actual source code for the experiment reported
below is
doquiv(&tmp, icomp1, icomp2);
doquiv(&tmp, icomp1, icomp2);
doquiv(&tmp, icomp2, icomp1);
doquiv(&tmp, icomp2, icomp1);
doquiv(&tmp, icomp1, icomp2);
Each doquiv does two sorts, so the array is sorted
ten times altogether. Each doquiv() sorts vertically,
then horizontally; each sort is major by bit-count,
minor by the bits' lexicographic value. icomp1
differs from icomp2 in that the major sort is
descending rather than ascending. (I did experiment
with using more than ten sort steps; the number of
pseudo-classes was reduced, but only modestly.)

The following table shows the number of equivalence
classes for binary matrix (or bipartite graph) of
given size, and the number of "pseudo-classes"
found by the described canonicalizer.
As seen, it works pretty well for smallish
sizes. But beware: In my application the
only k-edge graphs are derived by adding an
edge to a *pseudo-canonical* (k-1)-edge graph,
and the counting program did this also.
Hence the count of pseudo-classes may be incomplete.

size # classes # pseudo-classes
3x7 734 738
3x8 1324 1338
3x9 2284 2318
3x10 3790 3866

4x4 317 317
4x5 1053 1055
4x6 3250 3268
4x7 9343 9451
4x8 25207 25689
4x9 64167 65926
4x10 155004 160552

5x3 190 190
5x4 1053 1055
5x5 5624 5659
5x6 28576 28928
5x7 136758 139479

6x3 386 387

James

James Dow Allen

unread,

Jan 9, 2011, 8:49:40 AM1/9/11

to

On Jan 9, 8:45 pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> 6x3 386 387

Oops. incomplete cut-paste:

6x3 386 387
6x4 3250 3270
6x5 28576 28929
6x6 251610 256408
>
> James