You can also use another strategy :
Cls = FindClusters[event, DistanceFunction -> Dist01];
Map[Composition[Union, Flatten], Cls]
--
*********************************
Claude Mant�, IR CNRS
UMR CNRS 6117 LMGEM
http://www.com.univ-mrs.fr/LMGEM/
Centre d'Oc�anologie de Marseille
Campus de Luminy, Case 901
13288 MARSEILLE Cedex 09
tel : (+33) 491 829 127
fax : (+33) 491 829 119
HdR: http://hal.archives-ouvertes.fr/
*********************************
components[lst_List] := Module[{f},
Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]
Fred Simons
Eindhoven University of Technology
intersecting[{a_, b_}, {c_, d_}] := a <= c <= b || c <= b <= d
event = {{1, 2}, {1, 3}, {3, 4}, {5, 6}, {7, 8}, {8, 10}};
Union @@@ Split[event, intersecting]
{{1, 2, 3, 4}, {5, 6}, {7, 8, 10}}
Bobby
On Tue, 03 Nov 2009 01:51:28 -0600, fd <fdi...@gmail.com> wrote:
> All.
>
> I have a list which represents some natural event. These events are
> listed pair-wise, which corresponds to event happening within certain
> time interval from each other, as below
>
> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}
>
> I wish to find a thread of events, i.e. if event A is related to B and
> B to C, I wish to group {A,B,C} together. For the example above I
> would have
>
> {{1,2,3,4},{5,6},{7,8,10}}
>
> This would correspond to do a Graph Plot and identifying the parts
> which are disconnected It should be simple but I'm really finding it
> troublesome.
>
I enjoyed Fred Simons's solution tremendously.
I tried to speed it up a bit.
I compared the speed of components[] with the speed of WeakComponents
(from the GraphUtilities package) for random graphs (e.g.
RandomInteger[50000, {30000, 2}]). It seems that components[] is faster
than WeakComponents for as long as the graph doesn't have very large
connected components. However, as soon as large connected components
appear, components[] slows down a lot.
I looked into the source of WeakComponents to find out how it works, but
it turns out it uses undocumented functions, such as
SparseArray`StronglyConnectedComponents
The reason for the slowdown of components[] when large connected
components are present is that the f[] function needs to be evaluated in
several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition
of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of
f[1] will take 3 steps.
I tried to remedy this by changing f so that it re-defines itself each
time the left-hand-side of a particular definition can be evaluated
further. With the above example, evaluating f[1] would cause the
definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]
evaluates to f[4]). Here's the solution:
setSpecial[lhs_, rhs_] /; rhs =!= lhs :=
(lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])
components2[lst_List] :=
Module[{f},
Do[setSpecial @@ f /@ pair, {pair, lst}];
GatherBy[Union @@ lst, f]
]
This modified components2[] seems to be faster than WeakComponents[]
even for single-component random graphs, however, it is limited by
$RecursionLimit (which can't be increased indefinitely without risking a
crash)
Szabolcs
P.S. Here's the code I used to compare the speed of components[] and
WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow.
a = 0.7;
tw = Table[
g = RandomInteger[n, {Ceiling[a n], 2}];
{n, First@Timing[WeakComponents[Rule @@@ g]]},
{n, 2^Range[11, 16]}
]
tc = Table[
g = RandomInteger[n, {Ceiling[a n], 2}];
{n, First@Timing[components[g]]},
{n, 2^Range[11, 16]}
]
tc2 = Table[
g = RandomInteger[n, {Ceiling[a n], 2}];
{n, First@Timing[components2[g]]},
{n, 2^Range[11, 16]}
]
ListLogLogPlot[{tw, tc, tc2}, Joined -> True,
PlotMarkers -> Automatic]
I did think intervals only needed to overlap to "correspond", whereas your
solution requires them to share an end-point.
For instance, in this example the first interval is a subset of the second
interval and overlaps with the third, yet "components' associates it with
neither.
event = {{1, 2} + 1/2, {1, 3}, {3, 4}, {5, 6}, {7, 8}, {8, 10}};
components@event
{{1, 3, 4}, {3/2, 5/2}, {5, 6}, {7, 8, 10}}
I can't really guess the OP's intent.
Bobby
On Wed, 04 Nov 2009 00:33:25 -0600, Fred Simons <f.h.s...@tue.nl> wrote:
> Here is a very short, very fast but not very simple solution:
>
> components[lst_List] := Module[{f},
> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]
>
> Fred Simons
> Eindhoven University of Technology
>> All.
>>
>> I have a list which represents some natural event. These events are
>> listed pair-wise, which corresponds to event happening within certain
>> time interval from each other, as below
>>
>> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}
>>
>> I wish to find a thread of events, i.e. if event A is related to B and
>> B to C, I wish to group {A,B,C} together. For the example above I
>> would have
>>
>> {{1,2,3,4},{5,6},{7,8,10}}
>>
>> This would correspond to do a Graph Plot and identifying the parts
>> which are disconnected It should be simple but I'm really finding it
>> troublesome.
>>
>>
>>
>
The real brilliant thing is the improvement that was given by Szabolcs
Horvat in .
Fred
I came pretty late to the party, but let me add my voice to those delighted
by Fred's solution and Szabolcs's improvement. These are two of the most
brilliant and mind-blowing hacks I ever saw.
I would just like to add to this, that let us not underestimate the power of
Compile. Here is an implementation of weight-balancing path-compression
union-find algorithm taken straight from the Sedgewick's book, with some
minor changes as needed for adoption to Mathematica:
Clear[getTree];
getTree =
Compile[{{pairs, _Integer, 2}},
Module[{t = 0, i = 0, j = 0, xl = 0, yl = 0, k = 0,
len = Max[pairs], dad = Table[0, {Max[pairs]}],
present = Table[-1, {Max[pairs]}]},
For[k = 1, k <= Length[pairs], k++,
xl = i = pairs[[k, 1]];
yl = j = pairs[[k, 2]];
present[[i]] = present[[j]] = 1;
If[xl == yl, Continue[]];
While[dad[[i]] > 0, i = dad[[i]]];
While[dad[[j]] > 0, j = dad[[j]]];
While[dad[[xl]] > 0, t = xl; xl = dad[[xl]]; dad[[t]] = i];
While[dad[[yl]] > 0, t = yl; yl = dad[[yl]]; dad[[t]] = j];
If[i != j,
If[dad[[j]] <= dad[[i]],
If[dad[[i]] > 0, dad[[j]] = dad[[j]] + dad[[i]] - 1];
dad[[i]] = j;,
(* else *)
If[dad[[j]] > 0, (dad[[i]] = dad[[i]] + dad[[j]] - 1)];
dad[[j]] = i];
];
];
For[k = 1, k <= len, k++,
If[dad[[k]] == 0, dad[[k]] = k]];
{dad, present}]];
Clear[getComponents];
getComponents[pairs_] :=
Module[{dad, present},
{dad, present} = getTree[pairs];
Select[
GatherBy[
Transpose[{FixedPoint[dad[[#]] &, dad], Range[Length[dad]]}],
First][[All, All, 2]],
Length[#] > 1 || present[[First@#]] == 1 &]];
It would need less code and run faster yet if we didn't allow same-vertex
pairs like {1,1}. It is also limited by memory consumption - it allocates
the list of dimension equal to the maximum vertex label, so it will be a
waste if vertex set is a sparse array of large numbers. Also, vertex labels
must be strictly positive. Both of the last two limitations can be dealt
with by some sort of re-labeling - I did not include that.
Here is Szabolcs's comparison where I added timing for my version and did
some cosmetic changes (not that zero vertex numbers are not generated,
unlike the original comparison of Szabolcs):
In[1]:=
Clear[res1,res2,res0,i0,i1,i2];
i1=0;i2=0;i0=0;
In[2]:=
data = Table[RandomInteger[{1,n},{Ceiling[a n],2}],{n,2^Range[11,16]}];
In[3]:=
tc=(First@Timing[res0[i0++]=components[#]]&)/@data
Out[3]= {0.032,0.078,0.297,1.156,4.547,19.562}
In[4]:=
tc2=(First@Timing[res1[i1++]=components2[#]]&)/@data
Out[4]= {0.047,0.109,0.235,0.5,1.046,2.36}
In[5]:=
tc3=(First@Timing[res2[i2++]=getComponents[#]]&)/@data
Out[5]= {0.,0.031,0.031,0.078,0.188,0.391}
In[6]:= ListLogLogPlot[{tc,tc2,tc3},Joined->True,PlotMarkers->Automatic]
In[7]:=
res0[#]===res1[#]===res2[#]&/@Range[0,5]
Out[7]= {True,True,True,True,True,True}
Here <components> corresponds to Fred's solution, and <components2> to
Szabolcs's.
I have a feeling that the present compiled version and Szabolcs's one are
doing similar type of things (at least in spirit), by attempting to flatten
the emerging element trees at run-time. The speed-up achieved by compiled
version I attribute to the large time constant associated with hash
setting/lookup (assignments and DownValue definition lookup as compared to
compiled version of array indexing), plus very high efficiency of Part when
many elements are extracted at once - the fact that I exploit in the
FixedPoint construct.
Regards,
Leonid
In[1]:=
a=0.7;
before running anything else, for it to work.
another follow-up:
I profiled my code and found that the main bottleneck is the built-in
GatherBy function (who would think?). It happens to be sub-optimal for some
problems and can be beaten in some cases (below is an example). With the
following modifications:
Clear[getTree, listSplit, gatherBy, getComponentsNew];
getTree =
Compile[{{pairs, _Integer, 2}},
Module[{t = 0, i = 0, j = 0, xl = 0, yl = 0, k = 0,
len = Max[pairs], dad = Table[0, {Max[pairs]}]},
For[k = 1, k <= Length[pairs], k++,
xl = i = pairs[[k, 1]];
yl = j = pairs[[k, 2]];
If[xl == yl, Continue[]];
While[dad[[i]] > 0, i = dad[[i]]];
While[dad[[j]] > 0, j = dad[[j]]];
While[dad[[xl]] > 0, t = xl; xl = dad[[xl]]; dad[[t]] = i];
While[dad[[yl]] > 0, t = yl; yl = dad[[yl]]; dad[[t]] = j];
If[i != j,
If[dad[[j]] <= dad[[i]],
dad[[j]] += dad[[i]] - 1;
dad[[i]] = j;,
(*else*)
(dad[[i]] += dad[[j]] - 1);
dad[[j]] = i];];];
For[k = 1, k <= len, k++, If[dad[[k]] <= 0, dad[[k]] = k]];
dad]];
listSplit[x_List, lengths_List] :=
MapThread[Take[x, {##}] &, {Most[#], Rest[#] - 1}] &@
Accumulate[Prepend[lengths, 1]];
gatherBy[lst_List, flst_List] :=
listSplit[lst[[Ordering[flst]]], (Sort@Tally[flst])[[All, 2]]];
getComponentsNew[pairs_] :=
With[{dad = getTree[pairs]},
gatherBy[Range[Length[dad]], FixedPoint[dad[[#]] &, dad]]];
my code is now on par with yours in terms of performance:
aggs[n_, pairs_] :=
Module[{sp, t}, sp = SparseArray[Thread[pairs -> 1], {n, n}];
t = Sign[sp + Transpose[sp]];
SparseArray`StronglyConnectedComponents[t]]
In[1]:= Clear[resL, resC];
a = 0.7;
trials = Table[
RandomInteger[{1, k = 2^n}, {Ceiling[a k], 2}],
{n, 11, 19}];
In[2]:=
testL =
Table[With[{g = trials[[n]]}, {Length@g, First@Timing[resL[n] =
getComponentsNew@g]}], {n, 1, Length@trials}];
In[3]:=
testC = Table[With[{g = trials[[n]]},
{Length@g, First@Timing[resC[n] = aggs[Max[g], g]]}], {n, 1,
Length@trials}];
In[4]:= testL
Out[4]= {{1434, 0.01}, {2868, 0.03}, {5735, 0.05}, {11469, 0.14}, {22938,
0.241}, {45876, 0.42}, {91751, 0.992}, {183501, 2.093}, {367002, 4.256}}
In[5]:= testC
Out[5]= {{1434, 0.01}, {2868, 0.04}, {5735, 0.07}, {11469, 0.13}, {22938,
0.301}, {45876, 0.571}, {91751, 1.281}, {183501, 2.855}, {367002, 5.337}}
In[6]:=
Sort[Sort /@ resL[#]] === Sort[Sort /@ resC[#]] & /@
Range[Length[trials]]
Out[6]= {True, True, True, True, True, True, True, True, True}
I wish the compiler could somehow handle non-tensor structures like lists of
lists, or that Reap-Sow were somehow compilable and fast when compiled -
despite all my efforts, the most of the execution time in my code is spent
on the dumb task of collecting elements with the same tag in a list
(gatherBy), because I can not implement a compiled version.
For very large lists I expect your code to start winning because some parts
of mine are based on sorting which is N*log N rather than linear.
Regards,
Leonid
2009/11/7 Carl Woll <ca...@wolfram.com>
> Szabolcs Horv=E1t wrote:
>
>> On 2009.11.04. 7:34, Fred Simons wrote:
>>
>>
>>> Here is a very short, very fast but not very simple solution:
>>>
>>> components[lst_List] := Module[{f},
>>> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]
>>>
>>>
>>>
>>
>> I enjoyed Fred Simons's solution tremendously.
>>
>> I tried to speed it up a bit.
>>
>> I compared the speed of components[] with the speed of WeakComponents
>> (from the GraphUtilities package) for random graphs (e.g.
>> RandomInteger[50000, {30000, 2}]). It seems that components[] is faster
>> than WeakComponents for as long as the graph doesn't have very large
>> connected components. However, as soon as large connected components
>> appear, components[] slows down a lot.
>>
>> I looked into the source of WeakComponents to find out how it works, but
>> it turns out it uses undocumented functions, such as
>> SparseArray`StronglyConnectedComponents
>>
>> The reason for the slowdown of components[] when large connected
>> components are present is that the f[] function needs to be evaluated in
>> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition o=
f f
>> will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of=
f[1] will
>> take 3 steps.
>>
>> I tried to remedy this by changing f so that it re-defines itself each
>> time the left-hand-side of a particular definition can be evaluated furt=
her.
>> With the above example, evaluating f[1] would cause the definition of f=
[1]
>> to change from f[1]=f[2] to f[1]=f[4] (as f[2] evaluates to f[4]). =
Here's
>> the solution:
>>
>> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=
>> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])
>>
>> components2[lst_List] :=
>> Module[{f},
>> Do[setSpecial @@ f /@ pair, {pair, lst}];
>> GatherBy[Union @@ lst, f]
>> ]
>>
>> This modified components2[] seems to be faster than WeakComponents[] eve=
n
>> for single-component random graphs, however, it is limited by
>> $RecursionLimit (which can't be increased indefinitely without risking a
>> crash)
>>
>> Szabolcs
>>
>> P.S. Here's the code I used to compare the speed of components[] and
>> WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow=
.
>>
>> a = 0.7;
>>
>> tw = Table[
>> g = RandomInteger[n, {Ceiling[a n], 2}];
>> {n, First@Timing[WeakComponents[Rule @@@ g]]},
>> {n, 2^Range[11, 16]}
>> ]
>>
>> tc = Table[
>> g = RandomInteger[n, {Ceiling[a n], 2}];
>> {n, First@Timing[components[g]]},
>> {n, 2^Range[11, 16]}
>> ]
>>
>> tc2 = Table[
>> g = RandomInteger[n, {Ceiling[a n], 2}];
>> {n, First@Timing[components2[g]]},
>> {n, 2^Range[11, 16]}
>> ]
>>
>> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,
>> PlotMarkers -> Automatic]
>>
>>
>>
> Another late to the party post. I think this topic was discussed back in
> 2005, and I think the quickest solution then was found in my post:*
>
> http://tinyurl.com/ylon3hr*
>
> Anyway, the solution was:
>
> aggs[n_, pairs_] := Module[{sp, t},
> sp = SparseArray[Thread[pairs -> 1], {n, n}];
> t = Sign[sp + Transpose[sp]];
> SparseArray`StronglyConnectedComponents[t]]
>
> Here, the pairs argument needs to be a list of pairs of positive integers=
,
> and n is the maximum of these integers. A quick comparison with component=
s2
> follows:
>
> In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}];
>
> r1 = components2[g]; // Timing
> r2 = aggs[10^4, g]; // Timing
>
> Sort[Sort /@ DeleteCases[r1, {_}]] === Sort[Sort /@ DeleteCases[r2=
, {_}]]
>
>
> Out[52]= {0.499, Null}
>
> Out[53]= {0.016, Null}
>
> Out[54]= True
>
> So, about 30 times faster.
>
> Carl Woll
> Wolfram Research
>
The WeakComponents solution can be improved because it spends most of
the time generating the adjacency matrix. This is significantly
faster:
DeleteCases[#, {_}] &@
WeakComponents[SparseArray[data -> 1, {Max@data, Max@data}]]
Maxim Rytin
m...@inbox.ru
Anyway, the solution was:
aggs[n_, pairs_] := Module[{sp, t},
sp = SparseArray[Thread[pairs -> 1], {n, n}];
t = Sign[sp + Transpose[sp]];
SparseArray`StronglyConnectedComponents[t]]
Here, the pairs argument needs to be a list of pairs of positive
integers, and n is the maximum of these integers. A quick comparison
with components2 follows:
In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}];
r1 = components2[g]; // Timing
r2 = aggs[10^4, g]; // Timing
Sort[Sort /@ DeleteCases[r1, {_}]] === Sort[Sort /@ DeleteCases[r2, {_}]]