Multi-Threading Parallel Algorithm for Simple Cycle Detection in DiGraph

520 views
Skip to first unread message

Teodoro Criscione

unread,
Jan 21, 2021, 5:33:55 AM1/21/21
to networkx-discuss
Good morning everyone,

I need to detect all the simple cycles in a DiGraph (10109 nodes, 54010 edges). As you can imagine, the nx.simple_cycles  function based on Johnson algorithm runs out of memory very quickly, so it is not adapted to my case. 

I know some developed faster and more efficient algorithms for such problems, but no one of them is open source and/or available for Python, as long as I know. Do you know anything about them? Here below I report the papers I found:


For sure there is nothing like this implemented in Networkx, and I think it will be useful to develop an open-source tool for such analysis. Is anyone interested to help me?

Thank you,
best regards.
Teodoro




Kelly Boothby

unread,
Jan 22, 2021, 3:29:57 PM1/22/21
to networkx-discuss
Hi Teodoro,

I got curious and generated a random digraph with approximately the parameters of yours, and ran nx.simple_cycles with the following:

import random, networkx as nx
g = nx.DiGraph()
h = nx.gnm_random_graph(10000, 50000)
g.add_edges_from((u, v) if random.randint(0, 1) else (v, u) for u, v in h.edges())
c = 0
for _ in nx.cycles.simple_cycles(g):
    c+= 1
    if c%1000 == 0:
        print(c)

the memory usage is relatively constant and small... but there are millions of cycles and it looks like they're being generated at thousands per second.  I imagine that if you're saving the cycles in memory (e.g. calling list(nx.simple_cycles(g))) then it would quickly blow up.

Regards,

Kelly

Teodoro Criscione

unread,
Jan 22, 2021, 3:38:21 PM1/22/21
to networkx-discuss
Hello Kelly,
thank you for your reply!!

Yes, indeed I need to save the cycles in memory - using a list(nx.simple_cycles(g))) as you said.
My final purpose is actually to analyze cyclic motifs in a MultiDiGraph - where I have 'time' and 'weight' assigned at each edge.
So, my idea was to detect all the cycles projected in the DiGraph. After I collected all the cycles (up to length 5 at the moment), I will analyze other features of the edges and nodes involved in such cycles.

Thank you,
best regards.
Teodoro


Dan Schult

unread,
Jan 22, 2021, 5:43:47 PM1/22/21
to networkx...@googlegroups.com
It might be better to process/aggregate the cycles without storing them. 
When you need to analyze other features you can generate them again.


--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/fc364b82-68eb-414f-9962-c66237913326n%40googlegroups.com.

Teodoro Criscione

unread,
Jan 22, 2021, 6:58:49 PM1/22/21
to networkx-discuss
Thank you for your reply, Dan!

Do you mean for example by "delaying" all my analytical functions using Dask?
Do you think it can help? I never used it before, so I am learning by doing it now ...
any advice is welcomed!

My problem is that I wanted to derive some metrics at a node level - e.g. into how many cycles the node is participating at a given interval of time, and how much, and at a cycle level - e.g. Gini index, inflows, outflows, etc. Do you think I can do it anyway without storing them?

Thank you,
best regards.
Teodoro

Kelly Boothby

unread,
Jan 22, 2021, 7:28:47 PM1/22/21
to networkx...@googlegroups.com
I was thinking that it may be viable to filter the list down to cycles of length 5 and below (there shouldn't be that many in such a sparse graph); however that will waste tons of time generating and rejecting long cycles.  It would be easy to add a parameter which bounds the length (or better, two parameters; for upper and lower bounds).  Would that help, and if so, would you like to submit that feature request?

Regards,

Kelly

You received this message because you are subscribed to a topic in the Google Groups "networkx-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/networkx-discuss/hRwKTyf5gXM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/ec0c07e4-c33f-4142-9100-665add1a52f8n%40googlegroups.com.

Dan Schult

unread,
Jan 22, 2021, 7:31:35 PM1/22/21
to networkx...@googlegroups.com
I'm not sure about all of those metrics, but e.g. counting cycles would be straightforward -- store 10000 counter values (one for each node) and for each path, increment the counter for each node in that path.  Any measure that involves summing over the nodes or edges could be done with an iterator over the cycles.

To steal Kelly's code:
```
import random, networkx as nx
g = nx.DiGraph()
h = nx.gnm_random_graph(10000, 50000)
g.add_edges_from((u, v) if random.randint(0, 1) else (v, u) for u, v in h.edges())
cycles = {n: 0 for n in g}
for c in nx.cycles.simple_cycles(g):
    for n in c:
        cycles[n] += 1
```

I'm also not sure about how easy it would be for you to split this over Dask. 
But it's the kind of thing I think that was developed for.  Even if you had all the cycles, to compute the measures, you would just iterate over the cycles anyway, right?


Teodoro Criscione

unread,
Jan 23, 2021, 4:50:45 AM1/23/21
to networkx-discuss
Thank you, Kelly!! This is a great idea indeed, thank you so much!!
If there would be a possibility to store separately "cycles of length 2", "cycles of length 3", ' cycles of length 4",  "cycles of length 5", etc., that would be perfect (hopefully!).
I imagine it by adding a parameter to the Networkx function, like nx.simple_cycles(G, k=3) for simple cycles of length 3.
I will study better the source code and see how far I can go ...

Teodoro Criscione

unread,
Jan 23, 2021, 5:07:59 AM1/23/21
to networkx-discuss
Thank you so much, Dan!! 
Indeed your answer is complementary to Kelly's one, in my case...
I wrote my own code for cycles of length 3,4, and 5, but apparently, I run out of memory anyway - even storing separately cycles of different lengths with different algorithms.

My initial idea was to store cycles of different length in separate CSV files.
For example, for "cycles5.csv" reporting: 
Screenshot 2021-01-23 at 10.57.18.png
Considering the first row:
  • 1 is the ID number of the cycle, which indicates a specific set of nodes involved in such motif;
  • 1 is the ID number of the interval considered of such cycle; 
  • 'start' is the start of that interval of time ('end' respectively);
  • 'sumWeight' is the sum of all the weights of all the edges in that interval of time;
  • 'nEdges' is the number of edges (always equal to 5 in this case, of course);
  • In the other columns, the same logic is used to analyze edges taking place in the same interval of time Inside (INS), for outgoing flows (O_OUT), and for incoming flows (O_IN).
Do you think this would be still possible, Dan? 
I am not so familiar with generators and Dask functions, I am following some tutorials online - but it is not so intuitive.

Thank you so much again!!
Best regards,
Teodoro

Dan Schult

unread,
Jan 23, 2021, 11:49:32 AM1/23/21
to networkx...@googlegroups.com
Those csv files will be too much to store if you want a line for each cycle. 
There will likely be millions if not billions of such cycles.  The question is
what will you do with those csv files? Presumably you are going to average
the values across a column, or sum them, etc.  If that is your plan, then 
shift your thinking of the csv file as the storage of your cycles to thinking of the
NetworkX graph as the storage of your cycle information. Take the averages
and sums, etc as you iterate over the cycles, instead of creating the csv with
one row per cycle. So, one pass through might compute the number of cycles
for each unit interval of time. Another pass might be the total 
(WeightO_IN * length of time) as well as the (weightINS * length of time).

Teodoro Criscione

unread,
Jan 24, 2021, 2:10:35 AM1/24/21
to networkx...@googlegroups.com
Thank you, Dan!

My initial idea was to count the number of instances per each cycle of length k (intervals of time when the flow in the cycle starts and end, so I need to aggregate and order edges inside the cycle), the duration of each cycle, and then categorize the cycles of length k into subfamilies - e.g. cycles with a output bigger then the input in the same interval of time. In this way I can study the performance of the cycles of length k, but also the contribution of each node to such cycles.

Do you think it would be possible to modify the code such that it detects only cycles of a certain length as Kelly suggested - e.g. nx.simple_cycles(G, k=k), and then store them in separate json or txt files - e.g. cycles03-01.txt, cycles03-02.txt, cycles03-03.txt, etc?

Thank you so much for your help!!
It helped me a lot to clarify the issue so far...

————————
Teodoro Criscione
--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages