# Graph canonical form directly on matrices

42 views

### Christian Stump

Mar 6, 2018, 4:18:56 AM3/6/18
to sage-support
(This question is about speed improvements of an existing approach, not about finding a first toy solution.)

Let A, B be two ( (n+m) x n ) integer matrices. Say that these are isomorphic if they coincide up to *simultaneous* row and column permutations of the indices 0,...,n-1.

Example: [[1,0],[0,0]] and [[0,0],[0,1]] are isomorphic because interchanging rows 0 and 1 and also columns 0 and 1 interchange the two.
NonExample: [[1,0],[0,0]] and [[0,1],[0,0]] are not isomorphic.

I would now like to obtain a canonical form CF : Mat( (n+m) x n) -> Mat( (n+m) x n) for each isomorphism class. This is CF(M1) == CF(M2) if and only if M1 and M2 are isomorphic. I actually do not care if the canonical form is a matrix, a canonical hash function is enough and what I do use below.

This is almost the graph canonical form because two graphs are isomorphic if and only if their adjacency matrices are isomorphic in the above sense.

So what I do currently is

* turn my integer matrix into a directed, edge-labelled graph
* turn this edge-labelled graph into an unlabelled graph by replacing each non-one-label into a new vertex and keep track which new vertices represent the same edge label
* use the canonical form algorithm in Sage to compute a canonical form (I use the bliss version, because it allows me to not get back a graph, but only a list of edges which is good enough to compute a canonical hash).

The code is

def matrix_canonical_hash(M, n, m):
dg,partition = _matrix_to_digraph(M, n, m)
dg_canon = dg.canonical_label(partition=partition, algorithm="bliss", return_graph=False)
return hash(tuple(sorted(dg_canon)))

def _matrix_to_digraph(M, n, m):
dg = DiGraph(sparse=True)

edge_labels = []
new_vertex  = n+m

new_partition = []

for i, j in M.nonzero_positions():
if i < n:
a, b = M[i, j], M[j, i]
else:
a, b = M[i, j], -M[i, j]
if a > 0:
if a == 1 and b == -1:
else:
try:
x = edge_labels.index((a,b))
except ValueError:
x = len(edge_labels)
edge_labels.append((a,b))
new_partition.append([])
finally:
new_partition[x].append(new_vertex)
new_vertex += 1
elif i >= n:
if a == -1 and b == 1:
else:
a = -a
b = -b
try:
x = edge_labels.index((a,b))
except ValueError:
x = len(edge_labels)
edge_labels.append((a,b))
new_partition.append([])
finally:
new_partition[x].append(new_vertex)
new_vertex += 1
partition = [list(range(n))]
if m > 0:
partition.append(list(range(n,n+m)))
if new_partition:
partition.extend(new_partition)
return dg, partition

def fast_copy(M, n, m):
Mnew = zero_matrix(n+m,n)
for (i,j),val in M.dict().iteritems():
Mnew[i,j] = val
return Mnew

My matrices have a few more properties which I use in _matrix_to_digraph (namely that M[i,j] and M[j,i] have opposite signs and M[i,j] > 0 for i >= n).

Since I have to do many of such test, it needs to be fast. Currently, only 1/3 of the time is spent doing the bliss algorithm. My questions thus are:

1. Does anyone know if there is a (similarly optimized) version of the canonical form algorithms that do the canonical forms on matrices rather than on graphs?

2. Does anyone see how I can speed the construction of the canonical form input graph from a given matrix ?

Many thanks for your help -- any improvements will make their way into the cluster algebra and quiver mutation functionality in main Sage,

Christian

### Christian Stump

Mar 6, 2018, 4:33:05 AM3/6/18
to sage-support
Let me add that the situations I care about are n,m <= 20, the entries are <=5 and the matrices are sparsely filled. An random and typical example is

sage: M = matrix([(0, -1, 0, 0, 0, 0, 0, 1),
....:  (1, 0, 1, 0, 0, 0, 0, 0),
....:  (0, -1, 0, 0, 1, 0, 0, 0),
....:  (0, 0, 0, 0, 0, 1, 0, 0),
....:  (0, 0, -1, 0, 0, 0, 1, 0),
....:  (0, 0, 0, -1, 0, 0, -1, 0),
....:  (0, 0, 0, 0, -1, 1, 0, 0),
....:  (-2, 0, 0, 0, 0, 0, 0, 0),
....:  (-1, 1, 0, 0, 0, 0, 0, 0),
....:  (0, 1, 0, 0, 0, 0, 0, -1),
....:  (0, 1, 0, 1, 0, -1, 0, -1),
....:  (0, 2, -1, 1, 0, -1, 0, -1),
....:  (0, 2, -1, 0, 0, -1, 0, -1),
....:  (0, 2, 0, 0, -1, -1, 0, -1),
....:  (0, 2, 0, 0, -1, 0, -1, -1),
....:  (0, 2, 0, 0, 0, 0, -2, -1)]
....: )
sage: M
[ 0 -1  0  0  0  0  0  1]
[ 1  0  1  0  0  0  0  0]
[ 0 -1  0  0  1  0  0  0]
[ 0  0  0  0  0  1  0  0]
[ 0  0 -1  0  0  0  1  0]
[ 0  0  0 -1  0  0 -1  0]
[ 0  0  0  0 -1  1  0  0]
[-2  0  0  0  0  0  0  0]
[-1  1  0  0  0  0  0  0]
[ 0  1  0  0  0  0  0 -1]
[ 0  1  0  1  0 -1  0 -1]
[ 0  2 -1  1  0 -1  0 -1]
[ 0  2 -1  0  0 -1  0 -1]
[ 0  2  0  0 -1 -1  0 -1]
[ 0  2  0  0 -1  0 -1 -1]
[ 0  2  0  0  0  0 -2 -1]
sage: matrix_canonical_hash(M, 8, 8)
2718289463783950847

### Johan S. H. Rosenkilde

Mar 6, 2018, 5:41:50 AM3/6/18
Hi Christian,

The most inefficient part of _matrix_to_digraph seems to be the
following line:

> x = edge_labels.index((a,b))

which runs in linear time on the length of edge_labels. If all the pairs
(a,b) are different on your input matrix, that gives an n^3 running time
for this line alone. Compared to the asymptotics on Graph Isomorphism,
that's of course fine, but in practice it might be nasty.

To optimise the line above, you should replace the lists edge_labels and
new_partition with a dict from the pairs (a,b) directly to the
partitions, i.e. something like

edges_to_partitions = dict()
...
try:
p = edge_to_partitions[(a,b)]
except ValueError:
p = []
edge_to_partitions[(a,b)] = p
finally:
p.append(new_vertex)
new_vertex += 1
<etc>

Best,
Johan

### Christian Stump

Mar 6, 2018, 5:58:28 AM3/6/18
to sage-support
Dear Johan,

The most inefficient part of _matrix_to_digraph seems to be the
following line:

>                     x = edge_labels.index((a,b))

you are totally right, thanks for this suggestion! (Unfortunately, this will not change anything in practice, because the list edge_labels will usually be at most of length 2, and never be longer than 5 in my case, independent of n and m.)

Cheers,

Christian

### Simon King

Mar 6, 2018, 10:17:52 AM3/6/18
Hi Christian,

On 2018-03-06, Christian Stump <christi...@gmail.com> wrote:
> Let me add that the situations I care about are n,m <= 20, the entries are
><=5 and the matrices are sparsely filled. An random and typical example is

That matrix seems far too small to say anything substantial about
timings. However, profiling the hash computation in a loop, it seems that the
method add_edge is called quite frequently. However, I don't really see
where the time spent in the function _matrix_to_digraph (which seems to
be the bottle neck here) is spent.

I haven't really been able to work around the bottle neck, but got a
minor improvement (4%) as follows:

add_edges is frequently used, so, I tried to not create an empty graph and
then add edges to it, but instead create the list of vertices and edges and
then create the graph from these lists. However, there is no gain at all, as
DiGraph.__init__ calls add_edges basically in the same way as you do.

I changed the use of the list edge_labels.
You basically use it to tell at what point an edge label has been
created, and do this by frequently computing the index. It seems slightly
better to me to to store the same information in a dict. So, replace
edge_labels.index((i,j)) and the error catching by
edge_labels.setdefault((i,j), ...).

Also, it seems slightly faster to obtain the hash of a frozen set than
of a sorted tuple:
sage: dg_canon = dg.canonical_label(partition=partition, algorithm="bliss", return_graph=False)
sage: %timeit hash(frozenset(dg_canon))
100000 loops, best of 3: 2.79 µs per loop
sage: %timeit hash(tuple(sorted(dg_canon)))
100000 loops, best of 3: 4.57 µs per loop

And of course using cython helps as well. With the following Cython code, I am
down to 272 µs per loop, while your original Python code gave me 325 µs
per loop:

##################
def matrix_canonical_hash(M, n, m):
dg,partition = _matrix_to_digraph(M, n, m)
dg_canon = dg.canonical_label(partition=partition, algorithm="bliss", return_graph=False)
return hash(frozenset(dg_canon))

from sage.matrix.matrix0 cimport Matrix
from sage.all import matrix, DiGraph

cpdef _matrix_to_digraph(Matrix M, int n, int m):
cdef dict edge_labels = dict()
cdef int n_labels = 0
cdef int new_vertex = n+m
cdef list Edges = []

cdef list new_partition = []
cdef int i,j

for i,j in M.nonzero_positions():
a = M.get_unsafe(i,j)
if i < n:
b = M.get_unsafe(j,i)
else:
b = -M.get_unsafe(i, j)
if a > 0:
if a == 1 and b == -1:
Edges.append((i,j))
else:
x = edge_labels.setdefault((a,b), n_labels)
if n_labels < len(edge_labels):
new_partition.append([])
n_labels += 1
Edges.append((i,new_vertex))
Edges.append((new_vertex,j))
new_partition[x].append(new_vertex)
new_vertex += 1
elif i >= n:
if a == -1 and b == 1:
Edges.append((j,i))
else:
a = -a
b = -b

x = edge_labels.setdefault((a,b), n_labels)
if n_labels < len(edge_labels):
new_partition.append([])
n_labels += 1
Edges.append((j,new_vertex))
Edges.append((new_vertex,i))
new_partition[x].append(new_vertex)
new_vertex += 1
cdef list partition = [list(range(n))]
if m > 0:
partition.append(list(range(n,n+m)))
if new_partition:
partition.extend(new_partition)
return DiGraph([range(new_vertex),Edges],sparse=True), partition

M = matrix([(0, -1, 0, 0, 0, 0, 0, 1),
(1, 0, 1, 0, 0, 0, 0, 0),
(0, -1, 0, 0, 1, 0, 0, 0),
(0, 0, 0, 0, 0, 1, 0, 0),
(0, 0, -1, 0, 0, 0, 1, 0),
(0, 0, 0, -1, 0, 0, -1, 0),
(0, 0, 0, 0, -1, 1, 0, 0),
(-2, 0, 0, 0, 0, 0, 0, 0),
(-1, 1, 0, 0, 0, 0, 0, 0),
(0, 1, 0, 0, 0, 0, 0, -1),
(0, 1, 0, 1, 0, -1, 0, -1),
(0, 2, -1, 1, 0, -1, 0, -1),
(0, 2, -1, 0, 0, -1, 0, -1),
(0, 2, 0, 0, -1, -1, 0, -1),
(0, 2, 0, 0, -1, 0, -1, -1),
(0, 2, 0, 0, 0, 0, -2, -1)]
)
###############

Best regards,
Simon

### Simon King

Mar 6, 2018, 10:26:43 AM3/6/18
PS:

On 2018-03-06, Simon King <simon...@uni-jena.de> wrote:
> On 2018-03-06, Christian Stump <christi...@gmail.com> wrote:
> I haven't really been able to work around the bottle neck, but got a
> minor improvement (4%) as follows:
> ...
> And of course using cython helps as well. With the following Cython code, I am
> down to 272 µs per loop, while your original Python code gave me 325 µs
> per loop:

Sorry for having been unclear. Your original code gave 339 µs, my code in python
gave 325 µs (that's 4% improvement), and my code in cython is of course
still faster (that's 272 µs, thus almost 20% improvement in total).

Cheers,
Simon

### Christian Stump

Mar 6, 2018, 11:18:07 AM3/6/18
to sage-support
Hi Simon,

Thanks for trying! I was actually hoping for a way to completely avoid creating this sage DiGraph. But either to get the matrix directly into the algorithm (which currently seems impossible), or at least to directly construct some internal graph data structure.

Looking at the code of sage.graphs.bliss.canonical_form, you see that the input digraph is turned into a bliss_digraph (whatever that is). So I believe that there is, in principle, a way to bypass the DiGraph creation...

Thanks again, Christian

### Dima Pasechnik

Mar 7, 2018, 12:28:38 AM3/7/18
to sage-support
Ultimately, the classical canonical form/isomorphism implementations run on (di)graphs represented by 0-1 matrices, often
with bit entries. So that's how bliss_digraph is represented too.
Constructing it directly might be beneficial, especially if you have a problem of doing this on many small matrices
(20x20 would still be quite small; isomorphism complexity usually starts to kick in with, say, 100x100 matrices)

Anyhow, the main inefficiency in your approach is coming from constructing unnecessarily big graphs.
In fact, to encode an edge-coloured (say, k colours) n-vertex graph as an equivalent vertex-coloured graph
you only need O(n log k) vertices in the new graph. See Sect 14 in http://pallini.di.uniroma1.it/Guide.html
on how this can be done.

If you implemented this in Sage, it would be a great contribution!

Also, I am not sure whether bliss is faster than nauty (nauty also is a Sage standard package, but lacks a cython interface)
But this is another story.

Dima

### Simon King

Mar 7, 2018, 3:20:44 AM3/7/18
On 2018-03-07, Dima Pasechnik <dim...@gmail.com> wrote:
> Ultimately, the classical canonical form/isomorphism implementations run on
> (di)graphs represented by 0-1 matrices, often
> with bit entries. So that's how bliss_digraph is represented too.
> Constructing it directly might be beneficial, especially if you have a
> problem of doing this on many small matrices

For the record: I guess one should then at least provide a
sage/graphs/bliss.pxd header, or better: Move those parts of
sage/graphs/bliss.pyx that are merely library bindings to
sage/libs/bliss.*

Anyway, looking at sage/graphs/bliss.pyx, it seems easy to modify your
code to directly create a bliss graph.

Cheers,
Simon

### Christian Stump

Mar 7, 2018, 3:44:52 AM3/7/18
to sage-support
Dear Dima,

Ultimately, the classical canonical form/isomorphism implementations run on (di)graphs represented by 0-1 matrices, often
with bit entries. So that's how bliss_digraph is represented too.

Thanks for this info clarifying that it seems impossible to feed the algorithm any other kind of integer matrix.

Anyhow, the main inefficiency in your approach is coming from constructing unnecessarily big graphs.

Also thanks for this reference, for the general problem this is certainly the correct approach. In my example cases though (as mentioned above), I have only very few edge labels (usually only two) so I doubt this improvement would change the overall performance.

If you implemented this in Sage, it would be a great contribution!

I agree that this would be very worth having! But my cython skills are too weak to do this in the appropriate speed context.

One question here: am I correct that the best we can currently hope for is to

1. feed the algorithm a list of labelled edges
2. turn this list of labelled edges into a list of unlabelled edges representing the edge-labelled graph as in Section 13
3. feed this to either the bliss or the nauty algorithm
4. get back an unlabelled graph
5. turn this back into an edge labelled graph

Also, I am not sure whether bliss is faster than nauty (nauty also is a Sage standard package, but lacks a cython interface)

The only reason I use bliss over nauty is that it can return only a list of edges rather than creating the output DiGraph. And in the tests I did, 1/3 of the time was spend on constructing the canonical form and 2/3 were used to turn these output edges into a DiGraph (this compares bliss with and without constructing the output graph, nauty was as fast as bliss with output graph construction).

Cheers, Christian

### Christian Stump

Mar 7, 2018, 3:46:28 AM3/7/18
to sage-support
Dear Simon,

Anyway, looking at sage/graphs/bliss.pyx, it seems easy to modify your
code to directly create a bliss graph.

Help there is highly appreciated :-), I don't know how to do that appropriately...

Christian

### Simon King

Mar 7, 2018, 9:02:05 AM3/7/18
Dear Christian,
Just have a look at sage/graphs/bliss.pyx. Apparently it uses c++, so,
that needs to be taken into account when cythoning and compiling. See
module_list.py, where it says:
OptionalExtension("sage.graphs.bliss",
["sage/graphs/bliss.pyx"],
language = "c++",
libraries = ['bliss'],
package = 'bliss'),

Apparently, it is then

cdef Digraph *g = new Digraph(order)

to create a Digraph with a given number of vertices (which are always
numbered in range(order)), and then an edge from v1 to v2 is added like
this:

In your code you basically do the same. Only problem: It seems that
you need to provide the number of vertices during initialisation of g.
But in your code, you add more vertices on the fly, IIRC. But this can
certainly be solved.

Best regards,
Simon

### Christian Stump

Mar 7, 2018, 10:14:02 AM3/7/18
to sage-support
Dear Simon and Dima,

I moved this to https://trac.sagemath.org/ticket/24924 for better traceability and to show you some code snippets...

Cheers, Christian

### Dima Pasechnik

Mar 7, 2018, 10:27:25 AM3/7/18
to sage-support

On Wednesday, March 7, 2018 at 8:44:52 AM UTC, Christian Stump wrote:
Dear Dima,

Ultimately, the classical canonical form/isomorphism implementations run on (di)graphs represented by 0-1 matrices, often
with bit entries. So that's how bliss_digraph is represented too.

Thanks for this info clarifying that it seems impossible to feed the algorithm any other kind of integer matrix.

It is perfectly possible to implement  canonical form/isomorphism of edge-labelled (di)graphs in the same vein as un-labelled ones.
(I did it as a part of a project in my undergraduate years, the code (in Fortran IV :-)) is unfortunately lost. It should not be too hard to re-do.)
It's just I'm not aware of such implementations available anywhere.

Anyhow, the main inefficiency in your approach is coming from constructing unnecessarily big graphs.

Also thanks for this reference, for the general problem this is certainly the correct approach. In my example cases though (as mentioned above), I have only very few edge labels (usually only two) so I doubt this improvement would change the overall performance.

OK, in this case indeed it won't be noticeable.

If you implemented this in Sage, it would be a great contribution!

I agree that this would be very worth having! But my cython skills are too weak to do this in the appropriate speed context.

One question here: am I correct that the best we can currently hope for is to

1. feed the algorithm a list of labelled edges
2. turn this list of labelled edges into a list of unlabelled edges representing the edge-labelled graph as in Section 13
3. feed this to either the bliss or the nauty algorithm
4. get back an unlabelled graph
5. turn this back into an edge labelled graph

the only improvement I could see in your scheme is that bliss/nauty can return a permutation you need to apply to your graph to make it canonical, and it would be probably more efficient to apply it (after the dropping the extra vertices you created) directly to your original labelled (di)graph.

Still faster would be to make a Sage graph backend compatible with bliss or nauty API in the sense that they could operate directly on
the graph you created, but this is much more work, naturally.