Memory Error with read_gml

183 views
Skip to first unread message

Travis Craddock

unread,
Apr 3, 2012, 1:11:34 PM4/3/12
to networkx-discuss
Hello,

I'm dealing with a large network of ~10 million nodes composing a file
of 1gb in size. When I try to read in this network using read_gml I
get a memory error. I am running on a cluster and have 100gb of mem
available, and don't understand how this exceeds the 100gb available.
Any explanation of this would be helpful, and any suggested work
arounds would be appreciated.

Thanks,
Travis

Dan Schult

unread,
Apr 3, 2012, 3:27:36 PM4/3/12
to networkx...@googlegroups.com
This may depend on what kind of nodes you are storing and what edge attributes too.
But just running with some smaller examples shows that even with integer nodes a
path graph (about as sparse as you can get) with 100,000 nodes takes ~100MB.
You probably have more edges than just a path, so it can start to soak up memory fast.

In general it is hard to convert from GML filesize to memory needed for the NX graph.
It is possible that your graph takes up more than 100GB. The important features are
the number (and kind) of nodes, number of edges and what kind of edge data you have.

If you are using long strings for nodes, you might be able to save some memory by
using integers instead. Do you really need the whole graph at once? It depends
what you are trying to do with it...
Dan

> --
> You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
> To post to this group, send email to networkx...@googlegroups.com.
> To unsubscribe from this group, send email to networkx-discu...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/networkx-discuss?hl=en.
>

Travis Craddock

unread,
Apr 3, 2012, 3:27:56 PM4/3/12
to networkx-discuss
My apologies. Here is the output.

Traceback (most recent call last):
File "Merge-gml.py", line 27, in <module>
Temp = nx.read_gml(infile)
File "<string>", line 2, in read_gml
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
networkx-1.6-py2.7.egg/networkx/utils/decorators.py", line 193, in
_open_file
result = func(*new_args, **kwargs)
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
networkx-1.6-py2.7.egg/networkx/readwrite/gml.py", line 85, in
read_gml
G=parse_gml(lines,relabel=relabel)
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
networkx-1.6-py2.7.egg/networkx/readwrite/gml.py", line 136, in
parse_gml
tokens =gml.parseString(data)
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 1021, in parseString
loc, tokens = self._parse( instring, 0 )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2623, in parseImpl
return self.expr._parse( instring, loc, doActions,
callPreParse=False )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2368, in parseImpl
loc, exprtokens = e._parse( instring, loc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2742, in parseImpl
loc, tmptokens = self.expr._parse( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2478, in parseImpl
ret = e._parse( instring, loc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2623, in parseImpl
return self.expr._parse( instring, loc, doActions,
callPreParse=False )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2368, in parseImpl
loc, exprtokens = e._parse( instring, loc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2623, in parseImpl
return self.expr._parse( instring, loc, doActions,
callPreParse=False )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2777, in parseImpl
loc, tmptokens = self.expr._parse( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2623, in parseImpl
return self.expr._parse( instring, loc, doActions,
callPreParse=False )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 2478, in parseImpl
ret = e._parse( instring, loc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 898, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/local/Python-2.7.2/lib/python2.7/site-packages/
pyparsing-1.5.6-py2.7.egg/pyparsing.py", line 1788, in parseImpl
result = self.re.match(instring,loc)
MemoryError


Travis

Aric Hagberg

unread,
Apr 3, 2012, 3:32:37 PM4/3/12
to networkx...@googlegroups.com

Also the GML format reader is known to be slow and memory inefficient:
https://networkx.lanl.gov/trac/ticket/426

Aric

Travis Craddock

unread,
Apr 3, 2012, 3:33:16 PM4/3/12
to networkx-discuss
Thanks Dan. The nodes store only the node id as well as a name
written as a string (30 char). This must be where the memory is eaten
up. I need the graph to find nodes of outdegree = 0.

Travis

Daπid

unread,
Apr 3, 2012, 4:26:26 PM4/3/12
to networkx...@googlegroups.com
On Tue, Apr 3, 2012 at 9:33 PM, Travis Craddock <tra...@ualberta.ca> wrote:
> I need the graph to find nodes of outdegree = 0.

You don't need to load the whole graph at all. In fact, you only need
to parse the GML file and look for what you want. The nodes are saved
as:

node
[
id 34
]

All the edges are stored in the form:

edge
[
source 34
target 32
]

So you just have to have a list of the id nodes of your graph, remove
then when a outgoing link appears, and if you want, recover its
properties from the definition.

In case you want to try first with something smaller, you may find
something here: http://www-personal.umich.edu/~mejn/netdata/

Reply all
Reply to author
Forward
0 new messages