But I don'understand in which way I can modify the file
kmeans.m to do this, yo choose like parameter distance
bhattacharyya: kmeans(matrix, cluster, 'dist', 'bhatt');
Thank's
There's a reason for that: K-means clustering requires not only a
distance metric, but also a way to compute the centroid of a cluster.
That is, the criterion that is minimized in K-means is the sum of
point-to-centroid distances, summed over all clusters. Thus, it is
natural to want the centroid to be the point that minimizes the
point-to-centroid distances within a cluster. The arithmetic mean does
that for (squared) Euclidean distance, there are a few distances for
which the centroid is easily computable. Even for something as simple
as (unsquared) Euclidean distance, it is _not easily computable.
> But I don'understand in which way I can modify the file
> kmeans.m to do this, yo choose like parameter distance
> bhattacharyya: kmeans(matrix, cluster, 'dist', 'bhatt');
The Wikipedia has this to say:
"The Bhattacharyya coefficient is a divergence-type measure; it can be
seen as the scalar product of the two vectors (one for p and one for q)
having as components the square root of the probability of the points x
\in X. It thereby lends itself to a geometric interpretation: the
Bhattacharyya coefficient is the cosine of the angle enclosed between
these two vectors."
That would seem to imply that you want to use the cosine distance on the
sqrt of your data, I think.
Or, use hierarchical clustering.
Hope this helps.
- Peter Perkins
The MathWorks, Inc.