Efficient way of performing an all against all neighborhood comparison in a large graph?

539 views
Skip to first unread message

Dan

unread,
Aug 15, 2008, 11:13:21 AM8/15/08
to networkx-discuss
Hi,

I've got a tricky problem that I'd appreciate some help with.

I have a large weighted Xgraph (26181 nodes and 2064695 edges). I am
attempting to compare the neighborhoods of all the nodes.
Specifically, I am trying to calculate the intercept and complement of
the intercept of the neighbors of each node i.e. how many neighbors
are shared between each pair of nodes and how many are not. Some nodes
might have identical neighborhoods, some may not have any nodes in
common (or anything in-between).

This involves making a lot of comparisons so I am trying to find ways
of speeding up the process. I tried reducing the search space by only
comparing within connected components as nodes in different components
would not share any neighbors (obtained using
'connected_components(G)'). Unfortunately I got one large component
with ~ 23000 members so the reduction was not that significant.

I am new to networkX(and newish to python) and I'm hoping there's some
shortcut I'm not aware of. If anyone has any ideas I'd be very
grateful.

Thanks

Dan

iMath

unread,
Aug 15, 2008, 12:14:53 PM8/15/08
to networkx-discuss
How about this: if A is the adjacency matrix of your graph (that is,
without weights) then the (i,j)'th entry in A^2 will count the number
of paths of length 2 from vertex i to vertex j. Each such path will
correspond to one vertex in their common neighborhood (assuming there
are no self-loops) This is of course O(n^3) if you do the
straightforward matrix multiplication, but this will be quite clean to
code.

Noel O'Boyle

unread,
Aug 15, 2008, 12:29:37 PM8/15/08
to networkx...@googlegroups.com
Represent the graph as an adjacency matrix (use a numpy.array). Then
just compare each row with AND. This will give the intercept.

Noel

2008/8/15 Dan <danie...@gmail.com>:

Dan Schult

unread,
Aug 15, 2008, 12:37:27 PM8/15/08
to networkx...@googlegroups.com
Does something like this work for you?

G=graph you are starting with.
I=XGraph()
C=XGraph()
I.add_nodes_from(G)
C.add_nodes_from(G)
for u in G:
unbrs=set(G[u])
for v in G:
if u==v:
I.add_edge(u,v,len(unbrs))
C.add_edge(u,v,0)
continue
vnbrs=set(G[v])
intercept=unbrs & vnbrs # intersection of two sets
complement=unbrs^vnbrs # symmetric difference of two sets
I.add_edge(u,v,len(intersection))
C.add_edge(u,v,len(complement))

Then I[n1][n2] yields the number of common neighbors
and C[n1][n2] yields the number of neighbors not in common.

(I haven't tested this code...)
Dan

Dan

unread,
Aug 15, 2008, 3:02:30 PM8/15/08
to networkx-discuss
Thanks a lot for the response.

Apologies if I'm missing something obvious but don't seem to be able
to convert the graph to an adjacency matrix. When use this command :

NX.to_numpy_matrix(K)

I get the following error :

"Conversion to numpy_matrix not allowed with for graphs with
multiedges."
TypeError: Conversion to numpy_matrix not allowed with for graphs with
multiedges.

My nodes do have multiple edges, does this complicate things?

One additional piece of information : I need to be able to retrieve
the weight of the edges to the shared and non shared nodes e.g.

A has edges to B and C

D has edges to C and E

I need to discover that A and D share 1 neighbor (C) and what the
weights are for A<->C and D<->C and also what the weights of their
unique neighbors are (A<->B and D<->E).

Any ideas why my conversion isn't working and if I would be able to
obtain the weights using your method?

Thanks a lot

Dan

Dan

unread,
Aug 15, 2008, 3:12:21 PM8/15/08
to networkx-discuss
Thanks for the response

The code you sent is similar (although much better) than how I had
originally coded it. The problem is the time it takes to run. Even
when the code just contains the following:


> G=graph you are starting with.
> I=XGraph()
> C=XGraph()
> I.add_nodes_from(G)
> C.add_nodes_from(G)
> for u in G:
> unbrs=set(G[u])
> for v in G:
> if u==v:
> continue
> vnbrs=set(G[v])

The comparison takes too long with the size of my network. I really
need a shortcut and/or a way of reducing the search space

Thanks again

Dan

Aric Hagberg

unread,
Aug 15, 2008, 5:42:25 PM8/15/08
to networkx...@googlegroups.com
On Fri, Aug 15, 2008 at 12:02:30PM -0700, Dan wrote:
>
> Thanks a lot for the response.
>
> Apologies if I'm missing something obvious but don't seem to be able
> to convert the graph to an adjacency matrix. When use this command :
>
> NX.to_numpy_matrix(K)
>
> I get the following error :
>
> "Conversion to numpy_matrix not allowed with for graphs with
> multiedges."
> TypeError: Conversion to numpy_matrix not allowed with for graphs with
> multiedges.
>
> My nodes do have multiple edges, does this complicate things?

Potentially - there is no simple way to represent multiple edges with
an adjacency matrix. Though maybe in your case you can reduce
the graph to one without multiple edges by summing the weights
of the parallel edges to make a single edge?

Then the numpy suggestion might be the fastest.
Or maybe scipy sparse matrices.

> One additional piece of information : I need to be able to retrieve
> the weight of the edges to the shared and non shared nodes e.g.
>
> A has edges to B and C
>
> D has edges to C and E
>
> I need to discover that A and D share 1 neighbor (C) and what the
> weights are for A<->C and D<->C and also what the weights of their
> unique neighbors are (A<->B and D<->E).
>

Maybe it is two steps. First find the neighbor overlaps then
look up the weights in the graph?

Aric


Dan

unread,
Aug 18, 2008, 7:19:03 AM8/18/08
to networkx-discuss

I've been playing around and I've tried a couple of things:

1)Iterating through edges instead of nodes

and

2)Calculating shortest paths

So for 1 - Instead of doing all against all node comparison, I iterate
through each edge in the graph and compare one conected node to the
neighborhood of the other conected node and vice verca. This way a lot
of nodes which do not share a neighbor will be missed out and I
*think* it should be quicker.

at the moment I'm doing this :

comparedic={} #fill dictionary with pairs to be compared
count=0
for edge in K.edges():
count+=1
print count
node1=edge[0] #node on one side of edge
node2=edge[1] #node on other side of edge
node1_neighbors=K.neighbors(node1)
for neighbor_node in node1_neighbors: #next few lines put each
comparison sorted in a dictionary so each pair will
ly be compared once

sorted=[neighbor_node,node2]
sorted.sort()
comparedic[sorted[0],sorted[1]]=1
node2_neighbors=K.neighbors(node2) #same thing but other way around
for neighbor_node in node2_neighbors:
sorted=[neighbor_node,node1]
sorted.sort()
comparedic[sorted[0],sorted[1]]=1


Do you have any thoughts on whether this is a better route? I was
wondering if it would be faster not to use a dictionary but to create
another graph so that edges connect nodes which share a neighbor. So
in each of the above edge iterations, I would copy the edges of each
node either side of the edge to the other node (in the new graph).
This might be worth it if there was an efficient way of giving one
nodes edges to another node (is there?); like the dictionary, the
graph would ignore multiple instances of the same edge (without me
having to do the sort).


2)Calculating shortest paths

As I am only interested in comparing nodes with a path length of 2, I
could do

NX.all_pairs_shortest_path_length(K,cutoff=2)

to return a list of eligable nodes. I think the time taken to
calculate the shortest paths might not make this worth while unless
the algorithm is particularly clever ( I don't understand how it goes
about calculating that) - also I would want to return ONLY the paths
of 2 not maximum of 2.


Thanks a lot for your attention, sorry if the explanations were a bit
confusing. I'd appreciate any thoughts you have on whether this might
work/how to make it more efficient or if it's back to the drawing
board.

Cheers

Dan
Reply all
Reply to author
Forward
0 new messages