Looking for some pointers

Sherlock

unread,

Apr 2, 2024, 9:29:38 PMApr 2

to networkx-discuss

I'm very new to this and looking for some help to point me in the right direction. I'm just beginning to learn the vernacular, so please excuse any errors.

I believe what I'm trying to solve is similar to the "traveling salesman problem". I have an unweighted, directed graph, where nodes will connect to between 1 and 6 other nodes. In total, there will be several thousand nodes. Since the connections are directional, just because you can travel from one node to another, doesn't mean you can travel back the same way. There will always be a path, however, for every node to eventually reach every other node.

With that in mind, I'm trying to locate an algorithm that can build a path/circuit to travel between a given set of nodes in the graph (more important), or even all nodes for that matter (less important, but desired). Backtracking is okay - it will definitely require traversing some of the same nodes more than once to find a path to other nodes.

Are there any algorithms already written in NetworkX (I'm using the python code) that can perform this? I have already written the code to build the graph (adding edges) that NetworkX can read. I'm just trying to find the right algorithm to feed it into at this point. Ideally, I'd give it the graph, a source node, and then a list of nodes for the circuit to build, and then it would find the optimal path to take through the graph to visit the given nodes.

It looks like nx.approximation.traveling_salesman_problem is close to what I'm looking for, except that it seems to only allow 1 visit per node (Hamiltonian?), which won't work with my graphs.

Thanks for any help or pointers. I don't mind reading up, just feeling a bit lost with there being so many different algorithms and not knowing what each do.

-Sherlock

Dan Schult

unread,

Apr 2, 2024, 9:42:48 PMApr 2

to networkx...@googlegroups.com

Yes, the traveling_salesman_problem function is what you want.

The original traveling salesman problem as stated in mathematics has every

node connected to every node. So you never have to revisit a node. This function applies that idea to a network where not all nodes are connection — so you will visit the same node more than once.

It solves the problem by first creating a different network with only the nodes you pick. And makes the distances between each node the length of the shortest path between those two nodes in the original network. This network does have an edge between each pair of nodes. The TSP is solved on that complete graph. And then the resulting path is translated back to the original network by replacing the edges by the corresponding shortest paths from the original network.

It does not guarantee the best result. But it will give something close to best, with a path through the nodes you request (at least once for each node) and only using edges in the original network.

Try it with a small network first to get the hang of how it works.

Sherlock

unread,

Apr 3, 2024, 10:35:58 AMApr 3

to networkx-discuss

Thanks - that makes a lot of sense. However, when I run the TSP function on any of my graphs, it just locks up 1 CPU at 100% and memory keeps growing until it consumes all of it. I then went and hand created a *very* small graph and it still did the same thing. When that didn't work, I changed the graph to one where it wouldn't have to cross the same node more than once, and it works. This leads me to believe the function is trying to find a Hamiltonian cycle only (if I'm understanding that term correctly - only crossing each node once). I looked for a parameter to pass the function to disable this, but didn't find one. Any ideas?

Dan Schult

unread,

Apr 3, 2024, 1:24:59 PMApr 3

to networkx...@googlegroups.com, Matt Schwennesen

Here's a small example that doesn't take long and goes through nodes twice.

In[5]: G=nx.path_graph(5)

In[6]: nx.approximation.traveling_salesman_problem(G)

Out[6]: [0, 1, 2, 3, 4, 3, 2, 1, 0]

So, it is correctly shifting to a temporary graph with all nodes connected in a weighted way, finding the hamiltonian cycle in that graph and then translating back to the original graph (which introduces the aspect of visiting nodes more than once).

It's hard to know from what you've described where the sticking point is.

What do you mean when you say "very small graph"?. Probably many more than 5 nodes. :)

Perhaps @Matt Schwennesen has some ideas where the sticking points might be.

My limited understanding is that starting with a good first guess is really helpful.

And -- in the end, this is a really hard problem. Some versions have been shown to be NP-complete.

How big are you talking about?

Matt Schwennesen

unread,

Apr 3, 2024, 2:34:12 PMApr 3

to networkx-discuss

Hello!

As Dan pointed out, the traveling salesman problem is really hard and even the algorithms to provide a good approximation generally have running times ranging from poor to outright bad. I'm not surprised that computing a TSP tour on a graph with several thousand nodes would take an extremely long time. I think we can further optimize the input to the TSP method beyond what it does normally.

The traveling_salesman_problem function is building another graph, a complete graph, with the weight of the shortest path in the original graph as the edge weight in the new graph. Importantly, it will not remove any vertices, so if your input graph has 5000 nodes the complete graph used by the algorithm will also have 5000 nodes. We really want to reduce this. I'd suggest creating a similar directed complete graph with only the nodes of interest. Weight the new graph so that each edge's weight is the length of the shortest path between those two nodes. Now hopefully we have the situation where the original graph has 5000 nodes and we've created a graph with a greatly reduced number of nodes. (If you record what each shortest path is, we even use the same backtracking method to recover the circuit after we've found it.)

Also consider exploring the method parameter to traveling_salesman_problem. The default for directed graph is the Asadpour method since it offers the best worst case bound, but it's also the slowest and most resource intensive. Personally, I wouldn't run this algorithm on a graph with more than about 10 nodes and expect reasonable performance. The TSP algorithms which work for directed graphs are:

The greedy method will be the fastest, but least precise. Exploring simulated annealing or threshold accepting should be faster and use less memory, but may not be as accurate as Asadpour. I don't have much personal experience with the middle two functions, so you might have to play around with the parameters to get something that behaves nicely.

Hope this helps!

Matt

Sherlock

unread,

Apr 3, 2024, 3:36:39 PMApr 3

to networkx-discuss

Thanks Dan and Matt. I was able to prove it works. I'm not entirely sure why it wasn't yesterday, but I have a hunch. I was running repeated tests after first trying it on the entire graph (30,000 nodes with 78,600 edges). This is the one that ate up all 64GB of physical memory and then was likely dipping into swap. So after running that I scaled back to a much smaller graph with 6 nodes and 14 edges, which still appeared to be spiraling out of control. However, that may have been due to the memory never releasing from processing the previous large graph. Today, testing the same thing (w/ smaller graph in a fresh process and memory freed up) it runs just fine. I just thought I was onto something yesterday because when I fed it a small graph (where it only had to visit each sector once), it was working.

I will try the suggestion of building a graph with just the nodes of interest. I expect that will have some challenges, because even though I'm only interested in usually a few hundred up to a thousand nodes, the graph will still require many more nodes, as the nodes of interest are not directly connected to each other. So this new graph could definitely be made smaller, but I'm not sure just how much smaller. What is a reasonable expectation of how many nodes and edges can be supported via this algorithm w/ 64GB of RAM? I am not too concerned with run time - if it takes an hour+ that's completely fine. I am able to construct a TSP path by brute force, but it takes a few hours. I was hoping to cut that time down some by using an algorithm instead.

Last question, given this is an unweighted graph, do you think a TSP algorithm written with that in mind would speed things up considerably? I am not feeding it weights, so I assume it is using the default of 1. I'm just wondering if that would be slowing things down, running a calculation that it could skip.

Tung T. Nguyen

unread,

Apr 5, 2024, 7:29:54 AMApr 5

to networkx...@googlegroups.com

I am not sure whether this helps but in the past, I used the or-tools package developed by Google to deal with the TSP and related problems. Maybe, you can give it a try.

https://developers.google.com/optimization/routing/tsp

Best wishes, Tung

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/9671a0e6-4cae-4a40-a71d-393a31a1b9f5n%40googlegroups.com.

--

Tung T. Nguyen, Ph.D

Email: tun...@uchicago.edu

Website: https://tungprime.github.io/

Matt Schwennesen

unread,

Apr 5, 2024, 11:18:15 AMApr 5

to networkx-discuss

The performance questions are difficult to answer, I've never tried to push for maximum performance from the TSP algorithms and I was working most directly with this sub-package when all I had was a mid-level laptop. I am glad to hear that fully restarting the process freed up the memory.

As for the other two questions, I think they are related. You should be able to create a weighted graph with only nodes representing the ones that you're interested by finding all pairs of shortest paths between the nodes of interest and then using that for the call to traveling_salesman_problem. Because of this, we definitely want to solve the problem with a TSP method. I think the idea is most clear with a diagram, but I don't have a great way to create one, so I'll try my best with just text :)

Imagine that A and B are any two nodes you're interested in which have this shortest path:

A -> n1 -> n2 -> ... -> B

In the new graph create vertices A and B, add an edge between them and if the length of the path above was 100 give the edge a weight of 100. Also save the path between A and B in the original graph. Repeat this process for all pairs of nodes of interest. You should now have a weighted, complete, directed graph that we can pass to the TSP method that's much smaller than the original graph. Then, if the TSP route includes an edge A -> B we replace that with A -> n1 -> n2 -> ... -> B that we saved from earlier. Since the graph itself is already weighted and complete, it might make more sense to use one of the TSP algorithms directly without going through the traveling_salesman_problem so that the pre- and post- processing the traveling_salesman_problem doesn't complicate the results.

It is also, of course, worth considering Tung's suggestion of using a different tool that might be more performant. NetworkX is a pure python package, so it is slower than libraries that use C or C++ extensions.

~ Matt

Reply all

Reply to author

Forward