Reading/Writing in GML format doesn't preserve the node labels

474 views
Skip to first unread message

Alejandro Weinstein

unread,
May 19, 2012, 10:21:00 AM5/19/12
to networkx...@googlegroups.com
Hi:

I found that, under some conditions, the process of writing and reading a graph in GML format doesn't preserve the node labels. The following code illustrates the issue:

#########################################################
import networkx as nx

n = 3
G1 = nx.grid_2d_graph(n, n)

# Rename the nodes sequentially as 0, 1, ... n**2-1
sorted_nodes = sorted(G1.nodes(), key=lambda x: x[1] + n*x[0])
inverse_mapping = dict(enumerate(sorted_nodes))
mapping = dict(kv[::-1] for kv in inverse_mapping.iteritems())
nx.relabel_nodes(G1, mapping, copy=False)

nx.write_gml(G1, 'G1.gml')
G2 = nx.read_gml('G1.gml')
nx.write_gpickle(G1, 'G1.pickle')
G3 = nx.read_gpickle('G1.pickle')

print 'Original', G1.edges(1)
print 'GML     ', G2.edges(1)
print 'gpickle ', G3.edges(1)

#########################################################

The output is

Original [(1, 0), (1, 2), (1, 4)]
GML      [(1, 4), (1, 6)]
gpickle  [(1, 0), (1, 2), (1, 4)]

So in this example, after writing/reading the graph, the node labeled 1 in G1 and G2 are not the same. If, in the other hand, I save G1 using gpickle, then the labels are preserved. Note that although the labels are different, G1 and G2 are isomorphic.

Note also that if I rename the nodes of G1 using `nx.convert_node_labels_to_integers(G1, ordering='sorted')` instead of the relabel_nodes approach used in the code above, then the labels are preserved.

Is this the expected behavior of nx.write_gml/nx.read_gml, or is it a bug?

I am using version 1.7.dev_20120519061446.

Alejandro 

Aric Hagberg

unread,
May 19, 2012, 12:59:32 PM5/19/12
to networkx...@googlegroups.com
If you use relabel=True:
G2 = nx.read_gml('G1.gml',relabel=True)
then I think it works.

The default is relabel=False - maybe that is a "bug"?

Aric

Alejandro Weinstein

unread,
May 19, 2012, 5:20:19 PM5/19/12
to networkx...@googlegroups.com
On Saturday, May 19, 2012 10:59:32 AM UTC-6, A Hagberg wrote:
If you use relabel=True:
G2 = nx.read_gml('G1.gml',relabel=True)
then I think it works.

It works with the example above. However, in my code I use the 'label' attribute of the nodes to store the original (i,j) coordinate of the nodes, as in

# Add the corresponding (i,j) coordinate as an attribute of each node
for v in G1:
    G1.node[v]['label'] = str(inverse_mapping[v])

Then using relabel=True doesn't solve the original problem. Of course I can use something different than 'label' for the attribute, but I was surprised by this behavior.

The default is relabel=False - maybe that is a "bug"?


May be. I was expecting that write/read operations behaves like "function/inverse function pairs", but may be that's not the right way to think about this.

Alejandro. 

Aric Hagberg

unread,
May 19, 2012, 7:19:41 PM5/19/12
to networkx...@googlegroups.com
In general I'd like the functions to work that way (so maybe the
default should be relabel=True).

I guess the labeling confusion comes from the requirement that the
nodes in GML have an integer id. If there isn't a networkx node
attribute 'id' we just generate one (sequentially) and set the node
name to be the GML 'label' attribute. The use of 'label' as a GML
attribute for this purpose is consistent with the GML specification.

So maybe the best approach here is to use a different attribute for
your coordinates?

Aric

Alejandro Weinstein

unread,
May 19, 2012, 9:14:49 PM5/19/12
to networkx...@googlegroups.com
On Saturday, May 19, 2012 5:19:41 PM UTC-6, A Hagberg wrote:
I guess the labeling confusion comes from the requirement that the
nodes in GML have an integer id.  If there isn't a networkx node
attribute 'id' we just generate one (sequentially) and set the node
name to be the GML 'label' attribute.  The use of 'label' as a GML
attribute for this purpose is consistent with the GML specification.

OK. I think changing the default to relabel=True and may be adding a note in the docs saying that one must be careful when using 'label' as a node parameter would be good. 
 
So maybe the best approach here is to use a different attribute for
your coordinates?


Sure.  That will work.

In general terms, and assuming there is no need to interact with other programs, which of the supported formats is the one that works most transparently?

By the way, thanks for helping me with this.

Alejandro.
Reply all
Reply to author
Forward
0 new messages