Slow graph creation in Python

36 views
Skip to first unread message

Trevor John Gale

unread,
Dec 9, 2020, 2:25:26 PM12/9/20
to SNAP Users Group
Hello,

I'm trying to use SNAP to build up a large graph in Python. The graph is about 120M nodes and 350M edges. I'm finding that creating the graph is quite slow - it takes me 2.5s to create the node/edge data in NumPy, but ~5m to add the nodes/edges to the graph. I've run some benchmarks and ~40s of this 5m is NumPy iteration overhead, but the remainder appears to be in SNAP. I've tried reserving memory up front for both the graph and node edges, but this does not seem to make a significant impact. Using the unchecked variants of AddNode/AddEdge also helped a small amount, but the process still takes over 4.5 minutes. Any guidance on the most efficient way to create a graph would be greatly appreciate!

Trevor

Rok Sosic

unread,
Dec 9, 2020, 6:02:59 PM12/9/20
to snap-d...@googlegroups.com

Hi Trevor,

Most of the time is spent in building a node adjacency list, but there might be ways around it. Which OS and python environment are you using?

Best,

Rok

--
You received this message because you are subscribed to the Google Groups "SNAP Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snap-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/snap-discuss/32c79d29-e475-4e5d-b044-f608265626e9n%40googlegroups.com.

Trevor Gale

unread,
Dec 9, 2020, 6:19:16 PM12/9/20
to snap-d...@googlegroups.com
I'm on ubuntu 18.04 w/ python 3.6. Currently I'm experimenting with writing the graph to a edge list file and then loading it up all at once. Do you think this is a faster approach?

Trevor

From: snap-d...@googlegroups.com <snap-d...@googlegroups.com> on behalf of Rok Sosic <r...@cs.stanford.edu>
Sent: Wednesday, December 9, 2020 6:02 PM
To: snap-d...@googlegroups.com <snap-d...@googlegroups.com>
Subject: Re: Slow graph creation in Python
 
You received this message because you are subscribed to a topic in the Google Groups "SNAP Users Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snap-discuss/RaZdnyh1acA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snap-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/snap-discuss/dee01e70-5218-b046-6982-7d4701f09016%40cs.stanford.edu.

Rok Sosic

unread,
Dec 10, 2020, 5:28:47 PM12/10/20
to snap-d...@googlegroups.com

Hi,


I added a program to benchmark graph I/O to the repo, it is now in test/readgraph.py. It creates a graph using a preferential attachment and then performs graph I/O with different methods. The graph file size is about 6GB.


Here are the reading times for an undirected graph with 120M nodes and 360M edges:

- 350s, reading an edge list via LoadEdgeList()

- 512s, reading an adjacency list via LoadConnList()

- 29s, reading a binary graph via Load()

- 250s, reading a table via LoadSS() and then running ToGraph()


The best option is to build the graph once, save it in a binary format and then use that to load it at a later time.


Best,

Rok

Trevor Gale

unread,
Dec 11, 2020, 10:20:55 AM12/11/20
to snap-d...@googlegroups.com
Excellent, thank you! I wrote a program that generates the edge list quickly, so I can load it up and then write in a binary format to process more quickly in the future. Thanks!

Trevor

Sent: Thursday, December 10, 2020 5:28 PM
Reply all
Reply to author
Forward
0 new messages