Graph algorithms (subset, enumerate paths) - non-professional confused by library terminology

48 views

Skip to first unread message

Paul Moore

unread,

Jan 13, 2015, 11:51:44 AM1/13/15

to networkx...@googlegroups.com

Hi,

I'm trying to use networkx for a graph analysis problem, but I'm not a specialist in graph algorithms, and I can't find the functions I'm looking for because the terminology is unfamiliar to me.

What I have is a graph of recipes - input -> output, with edges marked by input and output quantity, and time taken. The graph is directed, and *almost* acyclic (there's one loop, but I'd be willing to just remove that for algorithms that need it if I had to). There are some multiple edges (which differ in their attributes), although relatively few. The graph is not large (160 nodes, 380 edges).

I want to pick out the subset of the graph involved in (say) circuit manufacture - i.e., anything that is used to make something that is used in the manufacture of circuits, plus anything that needs circuits (either directly or indirectly) in its construction. I don't see what I should be using to solve that problem. I looked at "Clustering", "Components" and "Connectivity", but they didn't seem to have anything useful. I guess I could just do my own recursive lookup of all edges beginning or ending in circuits, and expand outward from there. But isn't that available as a packaged up algorithm?

Moving on from there, I'd like to look at "bottlenecks" - to produce circuits, I ultimately need iron and copper, but copper is converted into an intermediate, cables. If I want to produce one circuit per unit of time, I need N units of copper and M units of iron. For iron, that's easy to calculate from the edge properties. For copper, though, it's a 2-step process (and potentially there may be more than one possible route, although it's rare) so I need to know the steps so that I can calculate each part in turn. So I want to be able to ask "what are the paths from copper to circuits"? My first thought was that "all_simple_paths" was what I was looking for, but that seemed to run for a very long time. So I suspect it's not actually what I want. The shortest path algorithms may be the best approach, if I work out how to express a "weight" function in terms of the edge properties. But it's not immediately obvious how I'd do that. There's has_path, but as far as I can see nothing that tells you what that path *is*!

Can anyone help? If there's any pointers to good documents for a newcomer explaining how to take a "I want to do X" type of problem statement, and express it in terms of the standard graph algorithms, that'd be great. It feels like a real shame that networkx seems like it would be really useful in the types of problem I work with, but every time I try to use it, I get stuck because of my lack of knowledge of the field :-(

Thanks,

Paul

Moritz Beber

unread,

Jan 13, 2015, 5:39:25 PM1/13/15

to networkx...@googlegroups.com

Hey Paul,

On Tue, Jan 13, 2015 at 5:51 PM, Paul Moore <p.f....@gmail.com> wrote:

Hi,
I'm trying to use networkx for a graph analysis problem, but I'm not a specialist in graph algorithms, and I can't find the functions I'm looking for because the terminology is unfamiliar to me.

What I have is a graph of recipes - input -> output, with edges marked by input and output quantity, and time taken. The graph is directed, and *almost* acyclic (there's one loop, but I'd be willing to just remove that for algorithms that need it if I had to). There are some multiple edges (which differ in their attributes), although relatively few. The graph is not large (160 nodes, 380 edges).

I want to pick out the subset of the graph involved in (say) circuit manufacture - i.e., anything that is used to make something that is used in the manufacture of circuits, plus anything that needs circuits (either directly or indirectly) in its construction. I don't see what I should be using to solve that problem. I looked at "Clustering", "Components" and "Connectivity", but they didn't seem to have anything useful. I guess I could just do my own recursive lookup of all edges beginning or ending in circuits, and expand outward from there. But isn't that available as a packaged up algorithm?

I think what you're looking for are all predecessors and successors of circuits. In order to get all of them you need to do a breadth-first or depth-first search from the circuit node(s). Check out the documentation on this page: https://networkx.github.io/documentation/latest/reference/algorithms.traversal.html

Moving on from there, I'd like to look at "bottlenecks" - to produce circuits, I ultimately need iron and copper, but copper is converted into an intermediate, cables. If I want to produce one circuit per unit of time, I need N units of copper and M units of iron. For iron, that's easy to calculate from the edge properties. For copper, though, it's a 2-step process (and potentially there may be more than one possible route, although it's rare) so I need to know the steps so that I can calculate each part in turn. So I want to be able to ask "what are the paths from copper to circuits"? My first thought was that "all_simple_paths" was what I was looking for, but that seemed to run for a very long time. So I suspect it's not actually what I want. The shortest path algorithms may be the best approach, if I work out how to express a "weight" function in terms of the edge properties. But it's not immediately obvious how I'd do that. There's has_path, but as far as I can see nothing that tells you what that path *is*!

You can find the shortest path from node a to node b instead of all of between all pairs of nodes (which is what you did I guess), and get a list of all nodes inbetween: https://networkx.github.io/documentation/latest/reference/algorithms.shortest_paths.html

Most algorithms assume an edge weight of 1 if you don't specify an attribute. So that should work for you? You could also look at centrality measures to identify further bottlenecks. https://networkx.github.io/documentation/latest/reference/algorithms.centrality.html

Can anyone help? If there's any pointers to good documents for a newcomer explaining how to take a "I want to do X" type of problem statement, and express it in terms of the standard graph algorithms, that'd be great. It feels like a real shame that networkx seems like it would be really useful in the types of problem I work with, but every time I try to use it, I get stuck because of my lack of knowledge of the field :-(

I think the networkx documentation is really great but at some point you may just have to get a book on graph theory from the library. There's also a review by Albert and Barabasi that introduces a bunch of concepts but I'm not sure how accessible it is. You can find it here: http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200201-30_RevModernPhys-StatisticalMech/200201-30_RevModernPhys-StatisticalMech.pdf

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To post to this group, send email to networkx...@googlegroups.com.
Visit this group at http://groups.google.com/group/networkx-discuss.
For more options, visit https://groups.google.com/d/optout.

Good luck,

Moritz

Paul Moore

unread,

Jan 14, 2015, 6:57:52 AM1/14/15

to networkx...@googlegroups.com

On Tuesday, 13 January 2015 22:39:25 UTC, Moritz wrote:

On Tue, Jan 13, 2015 at 5:51 PM, Paul Moore <p.f....@gmail.com> wrote:
I want to pick out the subset of the graph involved in (say) circuit manufacture - i.e., anything that is used to make something that is used in the manufacture of circuits, plus anything that needs circuits (either directly or indirectly) in its construction. I don't see what I should be using to solve that problem. I looked at "Clustering", "Components" and "Connectivity", but they didn't seem to have anything useful. I guess I could just do my own recursive lookup of all edges beginning or ending in circuits, and expand outward from there. But isn't that available as a packaged up algorithm?

I think what you're looking for are all predecessors and successors of circuits. In order to get all of them you need to do a breadth-first or depth-first search from the circuit node(s). Check out the documentation on this page: https://networkx.github.io/documentation/latest/reference/algorithms.traversal.html

Thanks. Following on from that suggestion, and looking a little further (the terms "predecessors" and "successors" were what I was missing in my searches) I found http://networkx.github.io/documentation/networkx-1.9.1/reference/generated/networkx.algorithms.dag.ancestors.html#networkx.algorithms.dag.ancestors which looks even better, as it seems to do the traversal for me.

So to get "everything linked to node N from graph G" I think I'd do

sub_G = G.subgraph(nx.ancestors(G, N) + nx.descendants(G, N))

Moving on from there, I'd like to look at "bottlenecks" - to produce circuits, I ultimately need iron and copper, but copper is converted into an intermediate, cables. If I want to produce one circuit per unit of time, I need N units of copper and M units of iron. For iron, that's easy to calculate from the edge properties. For copper, though, it's a 2-step process (and potentially there may be more than one possible route, although it's rare) so I need to know the steps so that I can calculate each part in turn. So I want to be able to ask "what are the paths from copper to circuits"? My first thought was that "all_simple_paths" was what I was looking for, but that seemed to run for a very long time. So I suspect it's not actually what I want. The shortest path algorithms may be the best approach, if I work out how to express a "weight" function in terms of the edge properties. But it's not immediately obvious how I'd do that. There's has_path, but as far as I can see nothing that tells you what that path *is*!

You can find the shortest path from node a to node b instead of all of between all pairs of nodes (which is what you did I guess), and get a list of all nodes inbetween: https://networkx.github.io/documentation/latest/reference/algorithms.shortest_paths.html
Most algorithms assume an edge weight of 1 if you don't specify an attribute. So that should work for you?

That's where I sort of started from. But in the first instance, if there are two paths A->B->C and A->B1->B2->C, I want to see *both*. The shortest path algorithm (with weight 1) will give me the first, but that might actually be worse if building B1 and B2 is faster (combined) than building B. As a first cut, I want to see *all* paths and I'll calculate the cost manually and choose the best. Longer term, I'd like to work out how to define a proper "weight" such that the shortest-path algorithm would work directly, but that's non-trivial (see below, if you care). I guess I could do an all-shortest-paths algorithm setting all weights to zero, but that seems a bit over-complex.

As to why weighting edges is hard, each edge has 3 relevant attributes. Number of items in, number of items out and time. So A->B may take 10 seconds and 3 A's to produce 5 B's. If B->C then needs 10 B's, I need to run A->B *twice* before it can run. I'm pretty sure I can translate that into some form of normalised weight, but it's not trivial (and I'd rather have a working naive algorithm to test my weight calculations against).

You could also look at centrality measures to identify further bottlenecks. https://networkx.github.io/documentation/latest/reference/algorithms.centrality.html

That sounds interesting, for when I get the basics sorted!

Can anyone help? If there's any pointers to good documents for a newcomer explaining how to take a "I want to do X" type of problem statement, and express it in terms of the standard graph algorithms, that'd be great. It feels like a real shame that networkx seems like it would be really useful in the types of problem I work with, but every time I try to use it, I get stuck because of my lack of knowledge of the field :-(

I think the networkx documentation is really great but at some point you may just have to get a book on graph theory from the library. There's also a review by Albert and Barabasi that introduces a bunch of concepts but I'm not sure how accessible it is. You can find it here: http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200201-30_RevModernPhys-StatisticalMech/200201-30_RevModernPhys-StatisticalMech.pdf

I'll take a look at that reference. I don't think there's anything *wrong* with the networkx documentation, it's just that as it's a complex and pretty deep area, there's a lot of terminology and concepts that's assumed. For the specialists the library is designed for, that's fine, but (IMO) graphs are one of the most underappreciated data structures around - they come up all over the place, but people routinely write their own naive algorithms. It's deceptively simple to set up something basic, then really hard to take it further.

Anyway, thanks for the pointers, I'll do some further digging now.

Paul

Moritz Beber

unread,

Jan 14, 2015, 7:31:13 AM1/14/15

to networkx...@googlegroups.com

Hi again,

On Wed, Jan 14, 2015 at 12:57 PM, Paul Moore <p.f....@gmail.com> wrote:

That's where I sort of started from. But in the first instance, if there are two paths A->B->C and A->B1->B2->C, I want to see *both*. The shortest path algorithm (with weight 1) will give me the first, but that might actually be worse if building B1 and B2 is faster (combined) than building B. As a first cut, I want to see *all* paths and I'll calculate the cost manually and choose the best. Longer term, I'd like to work out how to define a proper "weight" such that the shortest-path algorithm would work directly, but that's non-trivial (see below, if you care). I guess I could do an all-shortest-paths algorithm setting all weights to zero, but that seems a bit over-complex.

As to why weighting edges is hard, each edge has 3 relevant attributes. Number of items in, number of items out and time. So A->B may take 10 seconds and 3 A's to produce 5 B's. If B->C then needs 10 B's, I need to run A->B *twice* before it can run. I'm pretty sure I can translate that into some form of normalised weight, but it's not trivial (and I'd rather have a working naive algorithm to test my weight calculations against).

I don't know if you're from a linear programming/optimization background but the way you describe your goal really sounds like that would be a better approach. In biology, more specifically metabolic modeling, this is called flux-balance analysis. Basically you set up a bunch of equations of the form 2 A + 3 B -> 2 C that need to be mass-balanced and then maximize a goal function given certain inputs into the system.

Again, I don't know if that's something you already tried and now you're looking for a different, topology-based approach otherwise I definitely recommend looking into this.

Best,
Moritz

Carlton Gibson

unread,

Jan 14, 2015, 9:12:28 AM1/14/15

to networkx...@googlegroups.com

Hi Paul,

On 14 Jan 2015, at 12:57, Paul Moore <p.f....@gmail.com> wrote:

I think the networkx documentation is really great but at some point you may just have to get a book on graph theory from the library. There's also a review by Albert and Barabasi that introduces a bunch of concepts but I'm not sure how accessible it is. You can find it here: http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200201-30_RevModernPhys-StatisticalMech/200201-30_RevModernPhys-StatisticalMech.pdf

I'll take a look at that reference. I don't think there's anything *wrong* with the networkx documentation, it's just that as it's a complex and pretty deep area, there's a lot of terminology and concepts that's assumed. For the specialists the library is designed for, that's fine, but (IMO) graphs are one of the most underappreciated data structures around - they come up all over the place, but people routinely write their own naive algorithms. It's deceptively simple to set up something basic, then really hard to take it further.

You may already be passed it but there’s an O’Reilly book called “Social Network Analysis for Startups” that uses NetworkX in the context of some Russian social network data. It shows NetworkX in action and provides an introduction to the graph theory. I found it useful, maybe you would too.

Kind Regards,

Carlton

Reply all

Reply to author

Forward

0 new messages