Hi,I'm trying to use networkx for a graph analysis problem, but I'm not a specialist in graph algorithms, and I can't find the functions I'm looking for because the terminology is unfamiliar to me.What I have is a graph of recipes - input -> output, with edges marked by input and output quantity, and time taken. The graph is directed, and *almost* acyclic (there's one loop, but I'd be willing to just remove that for algorithms that need it if I had to). There are some multiple edges (which differ in their attributes), although relatively few. The graph is not large (160 nodes, 380 edges).I want to pick out the subset of the graph involved in (say) circuit manufacture - i.e., anything that is used to make something that is used in the manufacture of circuits, plus anything that needs circuits (either directly or indirectly) in its construction. I don't see what I should be using to solve that problem. I looked at "Clustering", "Components" and "Connectivity", but they didn't seem to have anything useful. I guess I could just do my own recursive lookup of all edges beginning or ending in circuits, and expand outward from there. But isn't that available as a packaged up algorithm?
Moving on from there, I'd like to look at "bottlenecks" - to produce circuits, I ultimately need iron and copper, but copper is converted into an intermediate, cables. If I want to produce one circuit per unit of time, I need N units of copper and M units of iron. For iron, that's easy to calculate from the edge properties. For copper, though, it's a 2-step process (and potentially there may be more than one possible route, although it's rare) so I need to know the steps so that I can calculate each part in turn. So I want to be able to ask "what are the paths from copper to circuits"? My first thought was that "all_simple_paths" was what I was looking for, but that seemed to run for a very long time. So I suspect it's not actually what I want. The shortest path algorithms may be the best approach, if I work out how to express a "weight" function in terms of the edge properties. But it's not immediately obvious how I'd do that. There's has_path, but as far as I can see nothing that tells you what that path *is*!
Can anyone help? If there's any pointers to good documents for a newcomer explaining how to take a "I want to do X" type of problem statement, and express it in terms of the standard graph algorithms, that'd be great. It feels like a real shame that networkx seems like it would be really useful in the types of problem I work with, but every time I try to use it, I get stuck because of my lack of knowledge of the field :-(
--Thanks,Paul
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To post to this group, send email to networkx...@googlegroups.com.
Visit this group at http://groups.google.com/group/networkx-discuss.
For more options, visit https://groups.google.com/d/optout.
On Tue, Jan 13, 2015 at 5:51 PM, Paul Moore <p.f....@gmail.com> wrote:I want to pick out the subset of the graph involved in (say) circuit manufacture - i.e., anything that is used to make something that is used in the manufacture of circuits, plus anything that needs circuits (either directly or indirectly) in its construction. I don't see what I should be using to solve that problem. I looked at "Clustering", "Components" and "Connectivity", but they didn't seem to have anything useful. I guess I could just do my own recursive lookup of all edges beginning or ending in circuits, and expand outward from there. But isn't that available as a packaged up algorithm?I think what you're looking for are all predecessors and successors of circuits. In order to get all of them you need to do a breadth-first or depth-first search from the circuit node(s). Check out the documentation on this page: https://networkx.github.io/documentation/latest/reference/algorithms.traversal.html
Moving on from there, I'd like to look at "bottlenecks" - to produce circuits, I ultimately need iron and copper, but copper is converted into an intermediate, cables. If I want to produce one circuit per unit of time, I need N units of copper and M units of iron. For iron, that's easy to calculate from the edge properties. For copper, though, it's a 2-step process (and potentially there may be more than one possible route, although it's rare) so I need to know the steps so that I can calculate each part in turn. So I want to be able to ask "what are the paths from copper to circuits"? My first thought was that "all_simple_paths" was what I was looking for, but that seemed to run for a very long time. So I suspect it's not actually what I want. The shortest path algorithms may be the best approach, if I work out how to express a "weight" function in terms of the edge properties. But it's not immediately obvious how I'd do that. There's has_path, but as far as I can see nothing that tells you what that path *is*!You can find the shortest path from node a to node b instead of all of between all pairs of nodes (which is what you did I guess), and get a list of all nodes inbetween: https://networkx.github.io/documentation/latest/reference/algorithms.shortest_paths.htmlMost algorithms assume an edge weight of 1 if you don't specify an attribute. So that should work for you?
You could also look at centrality measures to identify further bottlenecks. https://networkx.github.io/documentation/latest/reference/algorithms.centrality.html
Can anyone help? If there's any pointers to good documents for a newcomer explaining how to take a "I want to do X" type of problem statement, and express it in terms of the standard graph algorithms, that'd be great. It feels like a real shame that networkx seems like it would be really useful in the types of problem I work with, but every time I try to use it, I get stuck because of my lack of knowledge of the field :-(I think the networkx documentation is really great but at some point you may just have to get a book on graph theory from the library. There's also a review by Albert and Barabasi that introduces a bunch of concepts but I'm not sure how accessible it is. You can find it here: http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200201-30_RevModernPhys-StatisticalMech/200201-30_RevModernPhys-StatisticalMech.pdf
That's where I sort of started from. But in the first instance, if there are two paths A->B->C and A->B1->B2->C, I want to see *both*. The shortest path algorithm (with weight 1) will give me the first, but that might actually be worse if building B1 and B2 is faster (combined) than building B. As a first cut, I want to see *all* paths and I'll calculate the cost manually and choose the best. Longer term, I'd like to work out how to define a proper "weight" such that the shortest-path algorithm would work directly, but that's non-trivial (see below, if you care). I guess I could do an all-shortest-paths algorithm setting all weights to zero, but that seems a bit over-complex.As to why weighting edges is hard, each edge has 3 relevant attributes. Number of items in, number of items out and time. So A->B may take 10 seconds and 3 A's to produce 5 B's. If B->C then needs 10 B's, I need to run A->B *twice* before it can run. I'm pretty sure I can translate that into some form of normalised weight, but it's not trivial (and I'd rather have a working naive algorithm to test my weight calculations against).
On 14 Jan 2015, at 12:57, Paul Moore <p.f....@gmail.com> wrote:I think the networkx documentation is really great but at some point you may just have to get a book on graph theory from the library. There's also a review by Albert and Barabasi that introduces a bunch of concepts but I'm not sure how accessible it is. You can find it here: http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200201-30_RevModernPhys-StatisticalMech/200201-30_RevModernPhys-StatisticalMech.pdfI'll take a look at that reference. I don't think there's anything *wrong* with the networkx documentation, it's just that as it's a complex and pretty deep area, there's a lot of terminology and concepts that's assumed. For the specialists the library is designed for, that's fine, but (IMO) graphs are one of the most underappreciated data structures around - they come up all over the place, but people routinely write their own naive algorithms. It's deceptively simple to set up something basic, then really hard to take it further.