218 views

Skip to first unread message

Nov 4, 2009, 1:32:58 AM11/4/09

to

fd a �crit :

> All.

>

> I have a list which represents some natural event. These events are

> listed pair-wise, which corresponds to event happening within certain

> time interval from each other, as below

>

> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>

> I wish to find a thread of events, i.e. if event A is related to B and

> B to C, I wish to group {A,B,C} together. For the example above I

> would have

>

> {{1,2,3,4},{5,6},{7,8,10}}

>

> This would correspond to do a Graph Plot and identifying the parts

> which are disconnected It should be simple but I'm really finding it

> troublesome.

>

>

The 1st point is to build a convenient distance :

Dist01[list1_, list2_] := If[Length@Union[list1, list2] < 4, 0, 1];

then compute the distance matrix:

Adj = Outer[Dist01, event, event, 1];

and you get the graph:

GraphPlot[1 - Adj, VertexLabeling -> True]

> All.

>

> I have a list which represents some natural event. These events are

> listed pair-wise, which corresponds to event happening within certain

> time interval from each other, as below

>

> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>

> I wish to find a thread of events, i.e. if event A is related to B and

> B to C, I wish to group {A,B,C} together. For the example above I

> would have

>

> {{1,2,3,4},{5,6},{7,8,10}}

>

> This would correspond to do a Graph Plot and identifying the parts

> which are disconnected It should be simple but I'm really finding it

> troublesome.

>

>

The 1st point is to build a convenient distance :

Dist01[list1_, list2_] := If[Length@Union[list1, list2] < 4, 0, 1];

then compute the distance matrix:

Adj = Outer[Dist01, event, event, 1];

and you get the graph:

GraphPlot[1 - Adj, VertexLabeling -> True]

You can also use another strategy :

Cls = FindClusters[event, DistanceFunction -> Dist01];

Map[Composition[Union, Flatten], Cls]

--

*********************************

Claude Mant�, IR CNRS

UMR CNRS 6117 LMGEM

http://www.com.univ-mrs.fr/LMGEM/

Centre d'Oc�anologie de Marseille

Campus de Luminy, Case 901

13288 MARSEILLE Cedex 09

tel : (+33) 491 829 127

fax : (+33) 491 829 119

HdR: http://hal.archives-ouvertes.fr/

*********************************

Nov 4, 2009, 1:34:04 AM11/4/09

to

Here is a very short, very fast but not very simple solution:

components[lst_List] := Module[{f},

Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

Fred Simons

Eindhoven University of Technology

Nov 4, 2009, 1:37:12 AM11/4/09

to

Maybe this is what you want:

intersecting[{a_, b_}, {c_, d_}] := a <= c <= b || c <= b <= d

event = {{1, 2}, {1, 3}, {3, 4}, {5, 6}, {7, 8}, {8, 10}};

Union @@@ Split[event, intersecting]

{{1, 2, 3, 4}, {5, 6}, {7, 8, 10}}

Bobby

On Tue, 03 Nov 2009 01:51:28 -0600, fd <fdi...@gmail.com> wrote:

> All.

>

> I have a list which represents some natural event. These events are

> listed pair-wise, which corresponds to event happening within certain

> time interval from each other, as below

>

> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>

> I wish to find a thread of events, i.e. if event A is related to B and

> B to C, I wish to group {A,B,C} together. For the example above I

> would have

>

> {{1,2,3,4},{5,6},{7,8,10}}

>

> This would correspond to do a Graph Plot and identifying the parts

> which are disconnected It should be simple but I'm really finding it

> troublesome.

>

Nov 4, 2009, 2:18:08 AM11/4/09

to

fd a �crit :

> All.

>

> I have a list which represents some natural event. These events are

> listed pair-wise, which corresponds to event happening within certain

> time interval from each other, as below

>

> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>

> I wish to find a thread of events, i.e. if event A is related to B and

> B to C, I wish to group {A,B,C} together. For the example above I

> would have

>

> {{1,2,3,4},{5,6},{7,8,10}}

>

> This would correspond to do a Graph Plot and identifying the parts

> which are disconnected It should be simple but I'm really finding it

> troublesome.

>

>

> All.

>

> I have a list which represents some natural event. These events are

> listed pair-wise, which corresponds to event happening within certain

> time interval from each other, as below

>

> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>

> I wish to find a thread of events, i.e. if event A is related to B and

> B to C, I wish to group {A,B,C} together. For the example above I

> would have

>

> {{1,2,3,4},{5,6},{7,8,10}}

>

> This would correspond to do a Graph Plot and identifying the parts

> which are disconnected It should be simple but I'm really finding it

> troublesome.

>

>

Nov 5, 2009, 4:22:19 AM11/5/09

to

On 2009.11.04. 7:34, Fred Simons wrote:

> Here is a very short, very fast but not very simple solution:

>

> components[lst_List] := Module[{f},

> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>

> Here is a very short, very fast but not very simple solution:

>

> components[lst_List] := Module[{f},

> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>

I enjoyed Fred Simons's solution tremendously.

I tried to speed it up a bit.

I compared the speed of components[] with the speed of WeakComponents

(from the GraphUtilities package) for random graphs (e.g.

RandomInteger[50000, {30000, 2}]). It seems that components[] is faster

than WeakComponents for as long as the graph doesn't have very large

connected components. However, as soon as large connected components

appear, components[] slows down a lot.

I looked into the source of WeakComponents to find out how it works, but

it turns out it uses undocumented functions, such as

SparseArray`StronglyConnectedComponents

The reason for the slowdown of components[] when large connected

components are present is that the f[] function needs to be evaluated in

several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition

of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of

f[1] will take 3 steps.

I tried to remedy this by changing f so that it re-defines itself each

time the left-hand-side of a particular definition can be evaluated

further. With the above example, evaluating f[1] would cause the

definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]

evaluates to f[4]). Here's the solution:

setSpecial[lhs_, rhs_] /; rhs =!= lhs :=

(lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

components2[lst_List] :=

Module[{f},

Do[setSpecial @@ f /@ pair, {pair, lst}];

GatherBy[Union @@ lst, f]

]

This modified components2[] seems to be faster than WeakComponents[]

even for single-component random graphs, however, it is limited by

$RecursionLimit (which can't be increased indefinitely without risking a

crash)

Szabolcs

P.S. Here's the code I used to compare the speed of components[] and

WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow.

a = 0.7;

tw = Table[

g = RandomInteger[n, {Ceiling[a n], 2}];

{n, First@Timing[WeakComponents[Rule @@@ g]]},

{n, 2^Range[11, 16]}

]

tc = Table[

g = RandomInteger[n, {Ceiling[a n], 2}];

{n, First@Timing[components[g]]},

{n, 2^Range[11, 16]}

]

tc2 = Table[

g = RandomInteger[n, {Ceiling[a n], 2}];

{n, First@Timing[components2[g]]},

{n, 2^Range[11, 16]}

]

ListLogLogPlot[{tw, tc, tc2}, Joined -> True,

PlotMarkers -> Automatic]

Nov 5, 2009, 4:27:18 AM11/5/09

to

Brilliant as usual, Fred.

I did think intervals only needed to overlap to "correspond", whereas your

solution requires them to share an end-point.

For instance, in this example the first interval is a subset of the second

interval and overlaps with the third, yet "components' associates it with

neither.

event = {{1, 2} + 1/2, {1, 3}, {3, 4}, {5, 6}, {7, 8}, {8, 10}};

components@event

{{1, 3, 4}, {3/2, 5/2}, {5, 6}, {7, 8, 10}}

I can't really guess the OP's intent.

Bobby

On Wed, 04 Nov 2009 00:33:25 -0600, Fred Simons <f.h.s...@tue.nl> wrote:

> Here is a very short, very fast but not very simple solution:

>

> components[lst_List] := Module[{f},

> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>

> Fred Simons

> Eindhoven University of Technology

>> All.

>>

>> I have a list which represents some natural event. These events are

>> listed pair-wise, which corresponds to event happening within certain

>> time interval from each other, as below

>>

>> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}

>>

>> I wish to find a thread of events, i.e. if event A is related to B and

>> B to C, I wish to group {A,B,C} together. For the example above I

>> would have

>>

>> {{1,2,3,4},{5,6},{7,8,10}}

>>

>> This would correspond to do a Graph Plot and identifying the parts

>> which are disconnected It should be simple but I'm really finding it

>> troublesome.

>>

>>

>>

>

Nov 6, 2009, 5:15:09 AM11/6/09

to

I feel I have to remark that with respect to me there is nothing

brilliant in the solution I posted. The idea behind it was the result of

a discussion in this group, many, many years ago, on a similar problem.

But anyway, it is a very beautiful result!

brilliant in the solution I posted. The idea behind it was the result of

a discussion in this group, many, many years ago, on a similar problem.

But anyway, it is a very beautiful result!

The real brilliant thing is the improvement that was given by Szabolcs

Horvat in .

Fred

Nov 7, 2009, 6:47:32 AM11/7/09

to

Hi all,

I came pretty late to the party, but let me add my voice to those delighted

by Fred's solution and Szabolcs's improvement. These are two of the most

brilliant and mind-blowing hacks I ever saw.

I would just like to add to this, that let us not underestimate the power of

Compile. Here is an implementation of weight-balancing path-compression

union-find algorithm taken straight from the Sedgewick's book, with some

minor changes as needed for adoption to Mathematica:

Clear[getTree];

getTree =

Compile[{{pairs, _Integer, 2}},

Module[{t = 0, i = 0, j = 0, xl = 0, yl = 0, k = 0,

len = Max[pairs], dad = Table[0, {Max[pairs]}],

present = Table[-1, {Max[pairs]}]},

For[k = 1, k <= Length[pairs], k++,

xl = i = pairs[[k, 1]];

yl = j = pairs[[k, 2]];

present[[i]] = present[[j]] = 1;

If[xl == yl, Continue[]];

While[dad[[i]] > 0, i = dad[[i]]];

While[dad[[j]] > 0, j = dad[[j]]];

While[dad[[xl]] > 0, t = xl; xl = dad[[xl]]; dad[[t]] = i];

While[dad[[yl]] > 0, t = yl; yl = dad[[yl]]; dad[[t]] = j];

If[i != j,

If[dad[[j]] <= dad[[i]],

If[dad[[i]] > 0, dad[[j]] = dad[[j]] + dad[[i]] - 1];

dad[[i]] = j;,

(* else *)

If[dad[[j]] > 0, (dad[[i]] = dad[[i]] + dad[[j]] - 1)];

dad[[j]] = i];

];

];

For[k = 1, k <= len, k++,

If[dad[[k]] == 0, dad[[k]] = k]];

{dad, present}]];

Clear[getComponents];

getComponents[pairs_] :=

Module[{dad, present},

{dad, present} = getTree[pairs];

Select[

GatherBy[

Transpose[{FixedPoint[dad[[#]] &, dad], Range[Length[dad]]}],

First][[All, All, 2]],

Length[#] > 1 || present[[First@#]] == 1 &]];

It would need less code and run faster yet if we didn't allow same-vertex

pairs like {1,1}. It is also limited by memory consumption - it allocates

the list of dimension equal to the maximum vertex label, so it will be a

waste if vertex set is a sparse array of large numbers. Also, vertex labels

must be strictly positive. Both of the last two limitations can be dealt

with by some sort of re-labeling - I did not include that.

Here is Szabolcs's comparison where I added timing for my version and did

some cosmetic changes (not that zero vertex numbers are not generated,

unlike the original comparison of Szabolcs):

In[1]:=

Clear[res1,res2,res0,i0,i1,i2];

i1=0;i2=0;i0=0;

In[2]:=

data = Table[RandomInteger[{1,n},{Ceiling[a n],2}],{n,2^Range[11,16]}];

In[3]:=

tc=(First@Timing[res0[i0++]=components[#]]&)/@data

Out[3]= {0.032,0.078,0.297,1.156,4.547,19.562}

In[4]:=

tc2=(First@Timing[res1[i1++]=components2[#]]&)/@data

Out[4]= {0.047,0.109,0.235,0.5,1.046,2.36}

In[5]:=

tc3=(First@Timing[res2[i2++]=getComponents[#]]&)/@data

Out[5]= {0.,0.031,0.031,0.078,0.188,0.391}

In[6]:= ListLogLogPlot[{tc,tc2,tc3},Joined->True,PlotMarkers->Automatic]

In[7]:=

res0[#]===res1[#]===res2[#]&/@Range[0,5]

Out[7]= {True,True,True,True,True,True}

Here <components> corresponds to Fred's solution, and <components2> to

Szabolcs's.

I have a feeling that the present compiled version and Szabolcs's one are

doing similar type of things (at least in spirit), by attempting to flatten

the emerging element trees at run-time. The speed-up achieved by compiled

version I attribute to the large time constant associated with hash

setting/lookup (assignments and DownValue definition lookup as compared to

compiled version of array indexing), plus very high efficiency of Part when

many elements are extracted at once - the fact that I exploit in the

FixedPoint construct.

Regards,

Leonid

Nov 7, 2009, 6:51:21 AM11/7/09

to

A follow-up to my previous post - in the comparison code that I posted one

must set the parameter <a> to some value (0.7 or some other)

must set the parameter <a> to some value (0.7 or some other)

In[1]:=

a=0.7;

before running anything else, for it to work.

Nov 8, 2009, 6:50:19 AM11/8/09

to

Carl,

another follow-up:

I profiled my code and found that the main bottleneck is the built-in

GatherBy function (who would think?). It happens to be sub-optimal for some

problems and can be beaten in some cases (below is an example). With the

following modifications:

Clear[getTree, listSplit, gatherBy, getComponentsNew];

getTree =

Compile[{{pairs, _Integer, 2}},

Module[{t = 0, i = 0, j = 0, xl = 0, yl = 0, k = 0,

len = Max[pairs], dad = Table[0, {Max[pairs]}]},

For[k = 1, k <= Length[pairs], k++,

xl = i = pairs[[k, 1]];

yl = j = pairs[[k, 2]];

If[xl == yl, Continue[]];

While[dad[[i]] > 0, i = dad[[i]]];

While[dad[[j]] > 0, j = dad[[j]]];

While[dad[[xl]] > 0, t = xl; xl = dad[[xl]]; dad[[t]] = i];

While[dad[[yl]] > 0, t = yl; yl = dad[[yl]]; dad[[t]] = j];

If[i != j,

If[dad[[j]] <= dad[[i]],

dad[[j]] += dad[[i]] - 1;

dad[[i]] = j;,

(*else*)

(dad[[i]] += dad[[j]] - 1);

dad[[j]] = i];];];

For[k = 1, k <= len, k++, If[dad[[k]] <= 0, dad[[k]] = k]];

dad]];

listSplit[x_List, lengths_List] :=

MapThread[Take[x, {##}] &, {Most[#], Rest[#] - 1}] &@

Accumulate[Prepend[lengths, 1]];

gatherBy[lst_List, flst_List] :=

listSplit[lst[[Ordering[flst]]], (Sort@Tally[flst])[[All, 2]]];

getComponentsNew[pairs_] :=

With[{dad = getTree[pairs]},

gatherBy[Range[Length[dad]], FixedPoint[dad[[#]] &, dad]]];

my code is now on par with yours in terms of performance:

aggs[n_, pairs_] :=

Module[{sp, t}, sp = SparseArray[Thread[pairs -> 1], {n, n}];

t = Sign[sp + Transpose[sp]];

SparseArray`StronglyConnectedComponents[t]]

In[1]:= Clear[resL, resC];

a = 0.7;

trials = Table[

RandomInteger[{1, k = 2^n}, {Ceiling[a k], 2}],

{n, 11, 19}];

In[2]:=

testL =

Table[With[{g = trials[[n]]}, {Length@g, First@Timing[resL[n] =

getComponentsNew@g]}], {n, 1, Length@trials}];

In[3]:=

testC = Table[With[{g = trials[[n]]},

{Length@g, First@Timing[resC[n] = aggs[Max[g], g]]}], {n, 1,

Length@trials}];

In[4]:= testL

Out[4]= {{1434, 0.01}, {2868, 0.03}, {5735, 0.05}, {11469, 0.14}, {22938,

0.241}, {45876, 0.42}, {91751, 0.992}, {183501, 2.093}, {367002, 4.256}}

In[5]:= testC

Out[5]= {{1434, 0.01}, {2868, 0.04}, {5735, 0.07}, {11469, 0.13}, {22938,

0.301}, {45876, 0.571}, {91751, 1.281}, {183501, 2.855}, {367002, 5.337}}

In[6]:=

Sort[Sort /@ resL[#]] === Sort[Sort /@ resC[#]] & /@

Range[Length[trials]]

Out[6]= {True, True, True, True, True, True, True, True, True}

I wish the compiler could somehow handle non-tensor structures like lists of

lists, or that Reap-Sow were somehow compilable and fast when compiled -

despite all my efforts, the most of the execution time in my code is spent

on the dumb task of collecting elements with the same tag in a list

(gatherBy), because I can not implement a compiled version.

For very large lists I expect your code to start winning because some parts

of mine are based on sorting which is N*log N rather than linear.

Regards,

Leonid

2009/11/7 Carl Woll <ca...@wolfram.com>

> Szabolcs Horv=E1t wrote:

>

>> On 2009.11.04. 7:34, Fred Simons wrote:

>>

>>

>>> Here is a very short, very fast but not very simple solution:

>>>

>>> components[lst_List] := Module[{f},

>>> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>>>

>>>

>>>

>>

>> I enjoyed Fred Simons's solution tremendously.

>>

>> I tried to speed it up a bit.

>>

>> I compared the speed of components[] with the speed of WeakComponents

>> (from the GraphUtilities package) for random graphs (e.g.

>> RandomInteger[50000, {30000, 2}]). It seems that components[] is faster

>> than WeakComponents for as long as the graph doesn't have very large

>> connected components. However, as soon as large connected components

>> appear, components[] slows down a lot.

>>

>> I looked into the source of WeakComponents to find out how it works, but

>> it turns out it uses undocumented functions, such as

>> SparseArray`StronglyConnectedComponents

>>

>> The reason for the slowdown of components[] when large connected

>> components are present is that the f[] function needs to be evaluated in

>> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition o=

f f

>> will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of=

f[1] will

>> take 3 steps.

>>

>> I tried to remedy this by changing f so that it re-defines itself each

>> time the left-hand-side of a particular definition can be evaluated furt=

her.

>> With the above example, evaluating f[1] would cause the definition of f=

[1]

>> to change from f[1]=f[2] to f[1]=f[4] (as f[2] evaluates to f[4]). =

Here's

>> the solution:

>>

>> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=

>> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

>>

>> components2[lst_List] :=

>> Module[{f},

>> Do[setSpecial @@ f /@ pair, {pair, lst}];

>> GatherBy[Union @@ lst, f]

>> ]

>>

>> This modified components2[] seems to be faster than WeakComponents[] eve=

n

>> for single-component random graphs, however, it is limited by

>> $RecursionLimit (which can't be increased indefinitely without risking a

>> crash)

>>

>> Szabolcs

>>

>> P.S. Here's the code I used to compare the speed of components[] and

>> WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow=

.

>>

>> a = 0.7;

>>

>> tw = Table[

>> g = RandomInteger[n, {Ceiling[a n], 2}];

>> {n, First@Timing[WeakComponents[Rule @@@ g]]},

>> {n, 2^Range[11, 16]}

>> ]

>>

>> tc = Table[

>> g = RandomInteger[n, {Ceiling[a n], 2}];

>> {n, First@Timing[components[g]]},

>> {n, 2^Range[11, 16]}

>> ]

>>

>> tc2 = Table[

>> g = RandomInteger[n, {Ceiling[a n], 2}];

>> {n, First@Timing[components2[g]]},

>> {n, 2^Range[11, 16]}

>> ]

>>

>> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,

>> PlotMarkers -> Automatic]

>>

>>

>>

> Another late to the party post. I think this topic was discussed back in

> 2005, and I think the quickest solution then was found in my post:*

>

> http://tinyurl.com/ylon3hr*

>

> Anyway, the solution was:

>

> aggs[n_, pairs_] := Module[{sp, t},

> sp = SparseArray[Thread[pairs -> 1], {n, n}];

> t = Sign[sp + Transpose[sp]];

> SparseArray`StronglyConnectedComponents[t]]

>

> Here, the pairs argument needs to be a list of pairs of positive integers=

,

> and n is the maximum of these integers. A quick comparison with component=

s2

> follows:

>

> In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}];

>

> r1 = components2[g]; // Timing

> r2 = aggs[10^4, g]; // Timing

>

> Sort[Sort /@ DeleteCases[r1, {_}]] === Sort[Sort /@ DeleteCases[r2=

, {_}]]

>

>

> Out[52]= {0.499, Null}

>

> Out[53]= {0.016, Null}

>

> Out[54]= True

>

> So, about 30 times faster.

>

> Carl Woll

> Wolfram Research

>

Nov 8, 2009, 6:51:29 AM11/8/09

to

On Nov 5, 3:22 am, Szabolcs Horv=E1t <szhor...@gmail.com> wrote:

> On 2009.11.04. 7:34, Fred Simons wrote:

>

> On 2009.11.04. 7:34, Fred Simons wrote:

>

> > Here is a very short, very fast but not very simple solution:

>

> > components[lst_List] := Module[{f},

> > Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]=>

> > components[lst_List] := Module[{f},

]

>

> I enjoyed Fred Simons's solution tremendously.

>

> I tried to speed it up a bit.

>

> I compared the speed of components[] with the speed of WeakComponents

> (from the GraphUtilities package) for random graphs (e.g.

r

> than WeakComponents for as long as the graph doesn't have very large

> connected components. However, as soon as large connected components

> appear, components[] slows down a lot.

>

> I looked into the source of WeakComponents to find out how it works, but

> it turns out it uses undocumented functions, such as

> SparseArray`StronglyConnectedComponents

>

> The reason for the slowdown of components[] when large connected

> components are present is that the f[] function needs to be evaluated in

> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition

n of

> f[1] will take 3 steps.

>

> I tried to remedy this by changing f so that it re-defines itself each

> time the left-hand-side of a particular definition can be evaluated

> definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]

> evaluates to f[4]). Here's the solution:

>

> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=

> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

>

> Module[{f},

> Do[setSpecial @@ f /@ pair, {pair, lst}];

> GatherBy[Union @@ lst, f]

> ]

>

> This modified components2[] seems to be faster than WeakComponents[]

> even for single-component random graphs, however, it is limited by> $RecursionLimit (which can't be increased indefinitely without risking a

> crash)

>

> Szabolcs

>

> P.S. Here's the code I used to compare the speed of components[] and

w.

>

> a = 0.7;

>

> tw = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[WeakComponents[Rule @@@ g]]},

> {n, 2^Range[11, 16]}

> ]

>

> tc = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[components[g]]},

> {n, 2^Range[11, 16]}

> ]

>

> tc2 = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[components2[g]]},

> {n, 2^Range[11, 16]}

> ]

>

> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,

> PlotMarkers -> Automatic]

The WeakComponents solution can be improved because it spends most of

the time generating the adjacency matrix. This is significantly

faster:

DeleteCases[#, {_}] &@

WeakComponents[SparseArray[data -> 1, {Max@data, Max@data}]]

Maxim Rytin

m...@inbox.ru

Nov 8, 2009, 7:06:27 AM11/8/09

to

Szabolcs Horv�t wrote:

> On 2009.11.04. 7:34, Fred Simons wrote:

>

> On 2009.11.04. 7:34, Fred Simons wrote:

>

>> Here is a very short, very fast but not very simple solution:

>>

>> components[lst_List] := Module[{f},

>> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>>

>>

>

>>

>> components[lst_List] := Module[{f},

>> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]

>>

>>

>

> I enjoyed Fred Simons's solution tremendously.

>

> I tried to speed it up a bit.

>

> I compared the speed of components[] with the speed of WeakComponents

> (from the GraphUtilities package) for random graphs (e.g.

> RandomInteger[50000, {30000, 2}]). It seems that components[] is faster >

> I tried to speed it up a bit.

>

> I compared the speed of components[] with the speed of WeakComponents

> (from the GraphUtilities package) for random graphs (e.g.

> than WeakComponents for as long as the graph doesn't have very large

> connected components. However, as soon as large connected components

> appear, components[] slows down a lot.

>

> I looked into the source of WeakComponents to find out how it works, but

> it turns out it uses undocumented functions, such as

> SparseArray`StronglyConnectedComponents

>

> The reason for the slowdown of components[] when large connected

> components are present is that the f[] function needs to be evaluated in

> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition

> of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of > connected components. However, as soon as large connected components

> appear, components[] slows down a lot.

>

> I looked into the source of WeakComponents to find out how it works, but

> it turns out it uses undocumented functions, such as

> SparseArray`StronglyConnectedComponents

>

> The reason for the slowdown of components[] when large connected

> components are present is that the f[] function needs to be evaluated in

> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition

> f[1] will take 3 steps.

>

> I tried to remedy this by changing f so that it re-defines itself each

> time the left-hand-side of a particular definition can be evaluated

> further. With the above example, evaluating f[1] would cause the

> definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]

> evaluates to f[4]). Here's the solution:

>

> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=

> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

>

> components2[lst_List] :=>

> I tried to remedy this by changing f so that it re-defines itself each

> time the left-hand-side of a particular definition can be evaluated

> further. With the above example, evaluating f[1] would cause the

> definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]

> evaluates to f[4]). Here's the solution:

>

> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=

> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

>

> Module[{f},

> Do[setSpecial @@ f /@ pair, {pair, lst}];

> GatherBy[Union @@ lst, f]

> ]

>

> This modified components2[] seems to be faster than WeakComponents[]

> even for single-component random graphs, however, it is limited by

> $RecursionLimit (which can't be increased indefinitely without risking a

> crash)

>

> Szabolcs

>

> P.S. Here's the code I used to compare the speed of components[] and

> WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow.> even for single-component random graphs, however, it is limited by

> $RecursionLimit (which can't be increased indefinitely without risking a

> crash)

>

> Szabolcs

>

> P.S. Here's the code I used to compare the speed of components[] and

>

> a = 0.7;

>

> tw = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[WeakComponents[Rule @@@ g]]},

> {n, 2^Range[11, 16]}

> ]

>

> tc = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[components[g]]},

> {n, 2^Range[11, 16]}

> ]

>

> tc2 = Table[

> g = RandomInteger[n, {Ceiling[a n], 2}];

> {n, First@Timing[components2[g]]},

> {n, 2^Range[11, 16]}

> ]

>

> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,

> PlotMarkers -> Automatic]

>

>

Another late to the party post. I think this topic was discussed back in

2005, and I think the quickest solution then was found in my post:*

2005, and I think the quickest solution then was found in my post:*

Anyway, the solution was:

aggs[n_, pairs_] := Module[{sp, t},

sp = SparseArray[Thread[pairs -> 1], {n, n}];

t = Sign[sp + Transpose[sp]];

SparseArray`StronglyConnectedComponents[t]]

Here, the pairs argument needs to be a list of pairs of positive

integers, and n is the maximum of these integers. A quick comparison

with components2 follows:

In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}];

r1 = components2[g]; // Timing

r2 = aggs[10^4, g]; // Timing

Sort[Sort /@ DeleteCases[r1, {_}]] === Sort[Sort /@ DeleteCases[r2, {_}]]

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu