Hi everyone.
I'm new to NetworkX (and network analysis in generally) and have hit a bit of a wall with a project. I imagine the problem I'm trying to solve is not too uncommon - so hoping to source some guidance here.
I'm attempting to:
- Create a directed graph representing the flow between a network of data assets
- Next, I'd like to sort that graph (left to right or top to bottom) to represent flow
- Have the ability to select one or more nodes in that network and visualize subgraphs that contain those nodes (ancestors, descendants, or the full subnetwork -- ancestors + descendants)
I've loosely accomplished this, but what I'm finding is that with #3 when I draw the subgraph some edges are overlapping and as a result the overlapping edge (and that relationship between the nodes) is getting lost visually.
I've found some
good solutions for how address this when the issue is multiple edges between the same nodes (using
connectionstyle to incrementally arc new edges) but I'm having trouble thinking through applying this to cases were the issue involves > 2 nodes.
Here's an example:
import networkx as nx
from networkx import all_neighbors, neighbors, ancestors, descendants
#from networkx.drawing.nx_pydot import write_dot
#from networkx.drawing.nx_agraph import write_dot
import matplotlib.pyplot as plt
def draw_topological_dag(G_in, title_in="DAG layout in topological order", node_color_in="#8EBAD9"):
"""
Draws a directed acyclic graph in it's topologically-sorted order.
Args:
G_in (networkx.classes.digraph.DiGraph): directed acyclic graph object
title_in (string): title for plot
node_color_in (string): hex color for node colors
Returns:
None
"""
for layer, nodes in enumerate(nx.topological_generations(G_in)):
# `multipartite_layout` expects the layer as a node attribute, so add the numeric layer value as a node attribute
for node in nodes:
G_in.nodes[node]["layer"] = layer
# Compute the multipartite_layout using the "layer" node attribute
pos = nx.multipartite_layout(G_in, subset_key="layer")
fig, ax = plt.subplots()
nx.draw_networkx(G_in, pos=pos, ax=ax, node_color=node_color_in)
ax.set_title(title_in)
fig.tight_layout()
plt.show()
G = nx.DiGraph([(0, 3), (1, 3), (2, 4), (3, 5), (3, 6), (4, 6), (5, 6), (6,7),(8,6),(3,5)])
draw_topological_dag(G)
Here's what that produces. An easy to interpret left-to-right DAG:
Next, I want to visualize a subgraph of that. Say I'm interested in node #3 and everything that a) influences it and everything that it influences:
def node_subnetwork(n_in):
"""
For a given node, produces a list of all nodes within it's sub-network including itself and all ancestors and descendents
Args:
n_in (int): node in question
Returns:
subnet_n (list): all nodes within sub-network
"""
anc_n = list(ancestors(G,n_in)) # Generate set of ancestor nodes to node n
dec_n = list(descendants(G,n_in)) # Generate set of descendant nodes to node n
# Generate list containing all nodes in the sub-network of node n_in
subnet_n = []
subnet_n += anc_n
subnet_n.append(n_in)
subnet_n += dec_n
return subnet_n
n = 3 # Node selection to generate subgraph
subnet = node_subnetwork(n)
SG = G.subgraph(subnet)
draw_topological_dag(SG, title_in=f"Subgraph of node {n} in topological order")
Here's the output and where I'm struggling:
I've lost the relationship here between nodes 3 and 6:
Something like the right plot there is what I'd expect and am hoping to generate automatically for a much larger and more complex graph.
Any thoughts on how to best do this? I'm starting to think that I'm following the wrong path and that there may be a much more straightforward way to do this.
Ultimately what I'm after is being able to sort a DAG to make sense of the flow visually at a glance and then be able to narrow focus to sub-networks for nodes of particular interest.
Thanks in advance for any help!