SOM, automatic cluster boundaries detection?

joaocarlos

unread,

Dec 22, 2009, 12:59:31 PM12/22/09

to

Hi,

I've been searching a lot for methods which automatically detect
clusters boundaries ( if any ) with SOM.

The vast majority of references resemble on the analyst expertise to
visually take the decision in the number of the clusters and where
are their boundaries.

One exception so far was this reference

Determining Cluster Boundaries using Particle Swarm Optimization
Anurag Sharma, and Christian W. Omlin

Anyone has other references and comments about that?

Greg Heath

unread,

Dec 23, 2009, 3:18:30 AM12/23/09

to

Determinimg the optimal number of clusters is not an easy task.
However, once they are formed it is easy to determine their
boundaries.

Warren Sarle has discussed the problem in the FAQ or elsewhere in
the archives.

I think Jain and Dubes have also worked on the problem.

Hope this helps.

Greg

Ian Parker

unread,

Dec 26, 2009, 3:55:41 PM12/26/09

to

http://www.google.co.uk/search?source=ig&hl=en&rlz=1G1GGLQ_ENUK247&q=K+Means&btnG=Google+Search&meta=lr%3D&aq=f&oq=

gives you a lot of research on the K-Means algorithm. If you take a
center in space of however many dimensions you choose you fing the
closest cluster center to each point. This is iteratively convergent.
How do you choose points? Various methods, look at the references.
These are found iterately. I myself have written a K-Means algorithm
for LSA clustering which I could let you have if you are interested.

- Ian Parker

joaocarlos

unread,

Dec 27, 2009, 6:07:49 PM12/27/09

to

Hi Greg an Ian,

thanks for the answers.

since I'm just learning these new techniques allow me some lack of
rigorous knowledge here, ok?

what I've found so far ?

Some papers present the SOM Clustering ( this is my focus ) as a
process of combining the training for BMUs discovering and them apply
different "direct" clustering techniques to these BMUs, right?

Here it's assumed that SOM would be "good" to detect the quantization
and projection properties of some data set and there are some metrics
I suppose to express how good it is besides quantization and
topographical error. I've found some papers discussing them.

One question that amuses me is that direct clustering techniques
produce clusters even they are absent, right?

Then the very basic and first question? How to express or what is the
metric most used to try express if there is or nor clusters present in
a dataset, if any at all?

If there is such a metric, then it would be possible, I think to use
it just to decide if is valid to start or not the SOM training
process. Once trained, then using some of the several direct clusters
methods could be used to try to detect some "optimal" solution or
event combine them. Another area o research I've recently faced
"consensus clustering", instead choosing one how to combine them.

What I'm trying to achieve is some one-pass process going from
training to clustering guided or qualified by some metrics.

On the other side, papers like these I referenced seems to try to
build the cluster from BMUs themselves, just like some algorithms for
selecting adjacent areas in images. You point to one pixel and
transverses its neighborhoods conecting others while the color
difference is below some level.

Thanks in advance for your time ....

JCarlos

joaocarlos

unread,

Dec 27, 2009, 6:12:15 PM12/27/09

to

On Dec 23, 1:18 am, Greg Heath <he...@alumni.brown.edu> wrote:

In time -)

I've just found this excellent paper from Sarle you've pointed out

http://vohweb.chem.ucla.edu/voh/classes%5Cwinter08%5C160BID48%5CNumClusters.pdf

JCarlos

Ian Parker

unread,

Dec 28, 2009, 7:12:44 AM12/28/09

to

On Dec 27, 11:07 pm, joaocarlos <jcanist...@gmail.com> wrote:

The question of when is a cluster not a cluster is an important one.
It is difficult to generalize. I will tell you what I do. I compute
the square error within each cluster and the square caused by the
clusters. Basically if the square error of the cluster centres exceeds
the square area of the points we have a valid cluster, if not we do
not.

Sarle's paper discusses this and also indicates that this is very much
oversimplified. If fact there is a filter of number of points in a
cluster. This corresponds roughly with Sarle's "Likelihood.

- Ian Parker