Subgroups in broken minimal spanning tree

Eli Bressert

unread,

Jul 27, 2010, 7:13:16 PM7/27/10

to networkx-discuss

Hi,

I'm trying to determine the best way about finding subgroups in a
broken minimal spanning tree. For example, take a look at the
following PNG file.

http://networkx-discuss.googlegroups.com/web/mst_and_broken_mst.png?gda=n4GL9EgAAABqQvvEw0AzKLQQpQGnIfbWCYSu6MlEdg_vFL9AqW_0laCE_Zhn-hcfg0nYfnSrnDglzhb83kORdLwM2moY-MeuGjVgdwNi-BwrUzBGT2hOzg&gsc=J5p-2SMAAAAni4LF7orUp1-ArCx8ir0F5d7BohKsWzM3CiDwwzdLiq0IoyLhPG2x5smOr2otMGI

The plot on the left an MST of a set of clusters. The right side
depicts a 'broken' MST where any edge longer than X length is thrown
away. The blue lines are the thrown away MST edges and the black lines
are the MST edges kept for finding subgroups.

Is there a function in networkx that does this or will simplify the
computational time to finding the subgroups? At the moment my current
python program takes a new array of nodes and edges from the broken
MST, then computes the subgroups via sets intersection/unions. The
current scripts that I have written work, but are painfully slow. Any
help, suggestions and like would be greatly appreciated. Thanks for
your help in advance.

Cheers,

Eli

Aric Hagberg

unread,

Jul 28, 2010, 11:21:42 AM7/28/10

to networkx...@googlegroups.com

On Tue, Jul 27, 2010 at 5:13 PM, Eli Bressert <ebre...@gmail.com> wrote:
> Hi,
>
> I'm trying to determine the best way about finding subgroups in a
> broken minimal spanning tree. For example, take a look at the
> following PNG file.
>

> The plot on the left an MST of a set of clusters. The right side
> depicts a 'broken' MST where any edge longer than X length is thrown
> away. The blue lines are the thrown away MST edges and the black lines
> are the MST edges kept for finding subgroups.
>
> Is there a function in networkx that does this or will simplify the
> computational time to finding the subgroups? At the moment my current
> python program takes a new array of nodes and edges from the broken
> MST, then computes the subgroups via sets intersection/unions. The
> current scripts that I have written work, but are painfully slow. Any
> help, suggestions and like would be greatly appreciated. Thanks for
> your help in advance.

I don't think I quite understand the algorithm. Can you describe it in more
detail or post code? How do you determine the edge lengths? After you remove
edges how do you define "subgroup"? Is it just the connected components?
(e.g. see http://networkx.lanl.gov/reference/algorithms.component.html)

Aric

Eli Bressert

unread,

Jul 28, 2010, 4:20:41 PM7/28/10

to networkx-discuss

The code is part of a large Python library that I wrote and it would
be hard to string it out. So, I'd be happy to explain it more.

The aims of the algorithm are

1) Take a set points, let's say 500 of them for example. The points
are distinct from one another by their position in a 2D space, x and
y.

2) We then construct a MST of the set. The edges that connect the
points have a length since the points are in 2D space. In this case we
can also call the edges the weights, which help determine the MST.
Let's assume that the edges have lengths from 1 to 100.

3) The MST connects each and every point by a set of edges. Now, let's
throw away any edge that exceeds a length of 10. The result would show
something like the plot on the right hand side (follow link). The
black lines are edges that remain after moving edges larger than, for
example, 10. The blue lines are the edges that have been removed.

http://networkx-discuss.googlegroups.com/web/mst_and_broken_mst.png?gda=n4GL9EgAAABqQvvEw0AzKLQQpQGnIfbWCYSu6MlEdg_vFL9AqW_0laCE_Zhn-hcfg0nYfnSrnDglzhb83kORdLwM2moY-MeuGjVgdwNi-BwrUzBGT2hOzg&gsc=J5p-2SMAAAAni4LF7orUp1-ArCx8ir0F5d7BohKsWzM3CiDwwzdLiq0IoyLhPG2x5smOr2otMGI

Unlike before when the all the edges were still there, not all of the
points are connected now. They are broken up into sets of edge-
connected points.

4) Taking the data, which is comprised of points and their edges, I
can deduce which points form a distinct set from others.
Unfortunately, the algorithm I have written is slow in Python. This
part of the code is still being tested and it's not part of my library
yet. It's posted below

The code may be quite cumbersome, but it's what I have come up with
for the moment. I will look at the link you provided to see if it's
useful. Thank you for your help.

Cheers,

Eli

### Code Begin ###
def _super_intersection(edges):
# This function is used inside the cluster function below.
# It works by cycling through
group = set(edges[0])
index = np.array([0])
k = 0
while k < 100:
k += 1
i = 0
for edge in edges[1:]:
i += 1
edge = set(edge)
if group & edge:
group = group | edge
index = np.append(index, i)

index = np.unique(np.array(index))
return group, index

def cluster(self, gmin = 5):
# A 2xN array of node IDs

# gmin is the minimum number of points needed to be
# considered a cluster (the set of points that are connected).
# An edge is identified by two points. So the edges variable
# is an array, N x 2, where each element is a unique point
# identified by an ID number (integers in this case).

edges = self.edges
group_nodes = {}
for no, edge in enumerate(edges):
try:
group, indice = _super_intersection(edges)
id_no = no
edges = np.delete(edges,indice,0)
if len(group) >= gmin:
group_nodes[id_no] = list(group)
except:
self.group_nodes = group_nodes

### Code End ###

On Jul 28, 4:21 pm, Aric Hagberg <ahagb...@gmail.com> wrote:

Dan Schult

unread,

Jul 28, 2010, 5:09:28 PM7/28/10

to networkx...@googlegroups.com

Would this work?
(suppose G already exists)

H=G.copy()
H.remove_edges_from( (u,v) for u,v,d in G.edges_iter(data=True) if d['weight']>10 )
subgroup_list=networkx.connected_components(H)

This would return a list of lists of nodes that are connected after removing edges with 'weight' > 10.
Dan

> --
> You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
> To post to this group, send email to networkx...@googlegroups.com.
> To unsubscribe from this group, send email to networkx-discu...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/networkx-discuss?hl=en.
>

Eli Bressert

unread,

Jul 28, 2010, 6:26:11 PM7/28/10

to networkx-discuss

Hi Dan and Aric,

Yes :-) It worked great. Wow - thanks for your suggestion there.

Aric, when I started looking at the link you provided I realized that
I did too much on my own code. Then Dan's example went straight to the
problem.

Thank you both for your time.

Cheers,

Eli

On Jul 28, 10:09 pm, Dan Schult <dsch...@colgate.edu> wrote:
> Would this work?
> (suppose G already exists)
>
> H=G.copy()
> H.remove_edges_from( (u,v) for u,v,d in G.edges_iter(data=True) if d['weight']>10 )
> subgroup_list=networkx.connected_components(H)
>
> This would return a list of lists of nodes that are connected after removing edges with 'weight' > 10.
> Dan
>
> On Jul 28, 2010, at 4:20 PM, Eli Bressert wrote:
>
>
>
> > The code is part of a large Python library that I wrote and it would
> > be hard to string it out. So, I'd be happy to explain it more.
>
> > The aims of the algorithm are
>
> > 1) Take a set points, let's say 500 of them for example. The points
> > are distinct from one another by their position in a 2D space, x and
> > y.
>
> > 2) We then construct a MST of the set. The edges that connect the
> > points have a length since the points are in 2D space. In this case we
> > can also call the edges the weights, which help determine the MST.
> > Let's assume that the edges have lengths from 1 to 100.
>
> > 3) The MST connects each and every point by a set of edges. Now, let's
> > throw away any edge that exceeds a length of 10. The result would show
> > something like the plot on the right hand side (follow link). The
> > black lines are edges that remain after moving edges larger than, for
> > example, 10. The blue lines are the edges that have been removed.
>

> >http://networkx-discuss.googlegroups.com/web/mst_and_broken_mst.png?g...

Reply all

Reply to author

Forward