I've recently ran into a case where I want to specify how edge weights are calculated when I run the traversal algorithm...
# Find minimum distancenetworkx.bidirectional_dijkstra(G, 'Sacramento', 'San Francisco', weighting=lambda u, v, attr_dict: attr_dict["distance"] )
Thanks for the code. I like the idea but would like to hear other
opinions on whether we should add this.
Clearly it adds some extra interesting flexibility in specifying the
'weights' for shortest path calculations. But arguments against adding
such a feature might be
- Performance decrease (not sure how much)
- Already can achieve the same result by looping over all edges and
calculating/adding a new 'weight' attribute (could use whatever name you like)
- Clutters interface to algorithm(s)
- For consistency other algorithms might also need to be changed
(e.g. minimum spanning tree)
Aric
Dan,
In the example you gave,
>> --
>> G = networkx.Graph()
>> G.add_edge('Sacramento', 'San Francisco', {"distance":200})
>>
>> v_speed = 65
>> def travel_time(u, v, attr_dict):
>> """
>> Calculates the edge weight based on the "distance" attribute and v_speed, where v_speed is included via the closure.
>> u: The source node for this edge.
>> v: The destination node for this edge
>> attr_dict: A dictionary containing the edge attributes.
>> """
>> return edge_attr["distance"] / v_speed
>>
>> # Find minimum travel time and the associated path
>> print networkx.bidirectional_dijkstra(G, 'Sacramento', 'San Francisco', weighting=travel_time)
>>
>> # Find minimum distance
>> networkx.bidirectional_dijkstra(G, 'Sacramento', 'San Francisco', weighting=lambda u, v, attr_dict: attr_dict["distance"] )
>> --
you can get the same result by simply dividing the path lengths by v_speed at the end. The path themselves won't change if you just multiply every edge weight by a constant.
That being said, I can think of examples where the weighting function is more complicated. The suggested approach do make sense, but, as Aric said, the same result can be obtained by just assigning a new attribute, say travelTime, to each edge and then running the shortest path algorithm
networkx.bidirectional_dijkstra(G, u, v, weight = "travelTime")
The weight optional argument has been added to the shortest path algorithms to do just that. Moreover, when adding edge attributes, nothing prevents you from using a function, for instance:
import networkx as nx
def f(u, v):
return u*v
G = nx.DiGraph()
G.add_edges_from([(u, v, {'myAttr': f(u, v)}) for u in range(10) for v in range(10)])
nx.bidirectional_dijkstra(G, 3, 9, weight = 'myAttr')
I feel that having an extra weighting argument is not necessary.
One possibility would be to try and generalize the role of the 'weight' argument so that it can a function as well. But again, I think this is redundant with what is already in place.
Loïc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (Darwin)
iEYEARECAAYFAkxQZD8ACgkQLKWiTxa2C82/kACdGDTgLmmRhZd+0D99UlQJvGX7
Er0AoIOHfJPehIWZn0LtYOOw4MJ1/ke7
=rTtE
-----END PGP SIGNATURE-----
Thanks for the code. I like the idea but would like to hear other
opinions on whether we should add this.
Clearly it adds some extra interesting flexibility in specifying the
'weights' for shortest path calculations. But arguments against adding
such a feature might be
- Performance decrease (not sure how much)
- Already can achieve the same result by looping over all edges and
calculating/adding a new 'weight' attribute (could use whatever name you like)
- Clutters interface to algorithm(s)
- For consistency other algorithms might also need to be changed
(e.g. minimum spanning tree)
It might also be a performance slowdown if one isn't careful. Suppose
the weight function you have runs in O(E) time (such as normalizing
all the weights by the max weight). Dijkstra's runs in something like
O(E + NLogN) time. If you pass dijkstra's the O(E) function, suddenly
it runs in O(E^2 + NLogN) time. However if the values are
precalculated and added as attributes it will run in O(E +E +NlogN)=O(E
+NlogN) time. Of course all this could be avoided with careful
programming, but I find myself rarely being a careful programmer.
Dan,
In the example you gave,
>> print networkx.bidirectional_dijkstra(G, 'Sacramento', 'San Francisco', weighting=travel_time)
>> --
>> G = networkx.Graph()
>> G.add_edge('Sacramento', 'San Francisco', {"distance":200})
>>
>> v_speed = 65
>> def travel_time(u, v, attr_dict):
>> """
>> Calculates the edge weight based on the "distance" attribute and v_speed, where v_speed is included via the closure.
>> u: The source node for this edge.
>> v: The destination node for this edge
>> attr_dict: A dictionary containing the edge attributes.
>> """
>> return edge_attr["distance"] / v_speed
>>
>> # Find minimum travel time and the associated path
>>>> --
>> # Find minimum distance
>> networkx.bidirectional_dijkstra(G, 'Sacramento', 'San Francisco', weighting=lambda u, v, attr_dict: attr_dict["distance"] )
you can get the same result by simply dividing the path lengths by v_speed at the end. The path themselves won't change if you just multiply every edge weight by a constant.
That being said, I can think of examples where the weighting function is more complicated. The suggested approach do make sense, but, as Aric said, the same result can be obtained by just assigning a new attribute, say travelTime, to each edge and then running the shortest path algorithm
I'm in agreement with Dan here, though I could be persuaded otherwise.
If you add the additional weight for every edge, and your weight
function is O(E), then you have O(E^2). If you subsequently run
Dijkstra, your overall time is O(E^2 + NlogN) rather than O(E + NlogN).
Given his most recent example of iterating through a long list of
vehicles, each of which interfaces to the graph differently, it is
definitely a possibility that the graph algorithm might not visit the
same number of edges with each vehicle. In this case, you can
definitely get a speed up.
Chris
I guess I still fail to see the speed-up. You needn't calculate your
new 'weight decorations' for all the edges. Since each edge has it's
own dictionary you can simple calculate it for the needed edges and
store it in the edge dictionary.
Simply check before the edge is used
i.e. G.edge[i][j].has_key('distance'), if not call the function.
Either way you are only calling the weight function when needed. The
way it is now, one calls the function and stores the return value, the
way you suggest, whichever algorithm is at work would calculate the
new weight, use it, and then it would likely be lost. Unless you
suggest the function both calculate a new weight value and store it in
the dictionary, then there is even less distinction.
Moreover I disagree that there is should be a 'very healthy
seperation' between a graph and other data. Most graphs that are used
to model physical systems have characteristics that intuitively should
be attached to nodes and edges. This is what makes the current system
as elegant as it is. The graph stores exactly much data as the user
wants it to. If edges have specific weight attributes, regardless of
how they are calculated, they should be attached to the edges in my
opinion.
I'm not opposed to these kinds of API changes, it just seems to be
there is little benefit, and maybe even a little unpythonic. This
would essentially make two ways to do the same thing, from the zen of
python
"There should be one-- and preferably only one --obvious way to do
it."
Agreed. In this case, it would not make sense to have the vehicles as
part of the graph. In some sense, you are composing the vehicles with
the graph object and then you are running a graph algorithm on the
"composed" object. The graph structure is distinct and it makes sense
that you would not want to modify it.
Generally, this seems to be in the same spirit of the
networkx.algorithms.attrmatrix.attr_matrix()
function. There, the edge_attr keyword could be a string or a callable.
If it is a callable, it receives the "from" and "to" nodes as inputs
from the algorithm and can calculate any "weight". With closures, one
should be able to make these functions do anything.
I've needed behavior like this before (hence attr_matrix) and it seems
you do currently as well. Perhaps as NetworkX usage continues to grow,
we'll see it requests for this frequently. Its definitely important to
balance functionality with performance and if performance cannot be
addressed (assuming it is even sizable), then maybe we have two
functions (at the cost of code duplication).
Chris
Ahh...yes, I should have thought more about the actual graph algorithm
being used. So perhaps this on-the-fly weight computation is not as
useful for Dijkstra or minimum spanning. Maybe more useful if there
were some algorithms that could quit early.
> Also Chris, you're right I screwed up in my initial analysis, but I
> don't think you have it quite right either. If the weight function
> takes O(E) time, then calculating first then running Dijkstra's
> would be O(E^2) for the calculation of the weights, then O(E+NlogN)
> for a total of O(E^2 +NlogN). If dijkstra's has to make a call to the
> weight function each time it needs to access an edge weight, then it
> must be O(E*O(E)+NLogN)=O(E^2+NlogN).
>
I think we are in agreement. :) Quoting:
> If you add the additional weight for every edge, and your weight
> function is O(E), then you have O(E2). If you subsequently run
> Dijkstra, your overall time is O(E2 + NlogN) rather than O(E +
> NlogN).
Chris
I figured out a little why I am confused, mostly because the answer to
the question:
for dijkstra's algorithm is all of them. Dijkstra's has to check all
>How do you know which edges you need until you run the algorithm?
edges to ensure that it has the shortest path. For minimum spanning
tree (networkx uses kruskal's algorithm), it's all of them. We have to
have the edges sorted by weight. So we would have to calculate the
weights ahead of time either way, whether you pass a function or do it
yourself resulting in O(E^2 +ElogE)=O(E^2). The only one that I can
tell that might not check all of them is A*, but that takes a
heuristic function, so you could define your weight function there.
Thus in your example, for each car you are still calculating the
weight for each edge, if you are using shortest_path or dijkstra's.
I'll agree this is not space efficient if you store all the vehicles
speeds on every weight, but you could replace them just as easily,
which would give you the same result as what you suggest.