[Issue / Patch / Review Request] Handling nan and infinities when reading and writing gml files.

65 views
Skip to first unread message

Jamie

unread,
Dec 28, 2020, 5:16:35 PM12/28/20
to networkx-discuss
Issue:  nan's and infinities can be written to a gml but NetworkX does not read them back in as valid floats (cannot read gml at all).
Git hub issue #4484: https://github.com/networkx/networkx/issues/4484
Pull request #4497: https://github.com/networkx/networkx/pull/4497
Doc changes: None, didn't know where this would fall.

I've uploaded a copy of Himsolt's *GML: A portable Graph File Format* here: https://github.com/jml-happy/documents/blob/main/gml-technical-report.pdf.  I think this is the only version, but there may be a newer version out there.

The standard makes no mention of nan or infinite values.  Looking at the BNF, NetworkX's output for nan and positive infinity are in the sub-language for key's (since repr(value).upper() is used).  NetworkX's tokenize() does match NAN and INF for keys, but rejects them for values since they match keys. -INF is not in the GML language and doesn't match any tokenize() regex.  This is the same for python native and numpy nan and infinities.

I think outputting inf as +INF and nan as _NAN is a decent solution.  Since _ isn't in the GML alphabet NAN and INF would remain valid keys and the _ prefix could be used for any future special cases.  But it does extend the alphabet beyond the standard.

Doing comparison with the string representation avoids the problem of float(nan) != np.nan.  Alternatively, np.isnan(), np.isneginf(), and np.isposinf() could be used (works with both python native and numpy values), but then gml.py would have to import numpy. I don't know which is preferred.  I didn't include anything to handle pandas pd.NA since I think there's a meaningful difference between NA/missing values in general, and nan floats.

So there are two parts to this issue:
1. Outputting special floats (nan, inf, -inf)
2. Reading those values and converting them to the proper Python objects

Current behavior:
```
>>> import networkx as nx
>>> nx.__version__
'2.6rc1.dev_20201228174925'
>>> import numpy as np
>>> np.__version__
'1.19.4'
>>>
>>> special_floats = [float('nan'), float('+inf'), float('-inf'),  np.nan, np.inf, np.inf * -1]
>>>
>>> G = nx.cycle_graph(6)
>>>
>>> # Assign special floats to attributes of G
>>> for i, e in enumerate(G.nodes):
...     G.nodes[i]["ndefloat"] = special_floats[i]
...     G.edges[e]["edgfloat"] = special_floats[i]
...
>>> # Show the resulting assignments
>>> G.nodes(data=True)
NodeDataView({0: {'ndefloat': nan}, 1: {'ndefloat': inf}, 2: {'ndefloat': -inf}, 3: {'ndefloat': nan}, 4: {'ndefloat': inf}, 5: {'ndefloat': -inf}})
>>> G.edges(data=True)
EdgeDataView([(0, 1, {'edgfloat': nan}), (0, 5, {'edgfloat': inf}), (1, 2, {'edgfloat': -inf}), (2, 3, {'edgfloat': nan}), (3, 4, {'edgfloat': inf}), (4, 5, {'edgfloat': -inf})])
>>>
>>> # Write gml file
>>> nx.write_gml(G, "special_floats.as.attributes.gml")
>>>
>>> H = nx.read_gml("special_floats.as.attributes.gml")
### ERROR ###
```

Contents of special_floats.as.attributes.gml:
```
graph [
  node [
    id 0
    label "0"
    ndefloat NAN
  ]
  node [
    id 1
    label "1"
    ndefloat INF
  ]
  node [
    id 2
    label "2"
    ndefloat -INF
  ]
  node [
    id 3
    label "3"
    ndefloat NAN
  ]
  node [
    id 4
    label "4"
    ndefloat INF
  ]
  node [
    id 5
    label "5"
    ndefloat -INF
  ]
  edge [
    source 0
    target 1
    edgfloat NAN
  ]
  edge [
    source 0
    target 5
    edgfloat INF
  ]
  edge [
    source 1
    target 2
    edgfloat -INF
  ]
  edge [
    source 2
    target 3
    edgfloat NAN
  ]
  edge [
    source 3
    target 4
    edgfloat INF
  ]
  edge [
    source 4
    target 5
    edgfloat -INF
  ]
]
```

After this pull request:

Contents of special_floats.as.attributes.gml:
```
graph [
  node [
    id 0
    label "0"
    ndefloat _NAN
  ]
  node [
    id 1
    label "1"
    ndefloat +INF
  ]
  node [
    id 2
    label "2"
    ndefloat -INF
  ]
  node [
    id 3
    label "3"
    ndefloat _NAN
  ]
  node [
    id 4
    label "4"
    ndefloat +INF
  ]
  node [
    id 5
    label "5"
    ndefloat -INF
  ]
  edge [
    source 0
    target 1
  ]
  edge [
    source 0
    target 5
  ]
  edge [
    source 1
    target 2
  ]
  edge [
    source 2
    target 3
  ]
  edge [
    source 3
    target 4
  ]
  edge [
    source 4
    target 5
  ]
]
```

GML file can be read successfully:
```
>>> H = nx.read_gml("special_floats.as.attributes.gml")
>>> H.nodes(data=True)
NodeDataView({'0': {'ndefloat': nan}, '1': {'ndefloat': inf}, '2': {'ndefloat': -inf}, '3': {'ndefloat': nan}, '4': {'ndefloat': inf}, '5': {'ndefloat': -inf}})
>>>
```
Reply all
Reply to author
Forward
0 new messages