I want to try k-means with the chebyshev distance. I know it is
possible to use euclidean and city block distances but what about
chebyshev? Is there a way to use this distance with the standard
k-means matlab function?
Thanks.
Sandro.
K-means (the algorithm) requires two things:
1) a distance measure, and
2) a way to compute the centroid of a cluster so as to minimize the sum of
point-to-centroid distances.
There are lots of distance measures, but few have a simple computation for the
centroid. For example, the default distance used by KMEANS (the function) is
most definitely NOT "Euclidean", but rather "squared Euclidean", because for the
latter, the centroid is the arithmetic mean, while for the former, it is a
difficult computation requiring an iterative solution (the calculation has a
hyphenated name attached to it that escapes me).
That's why no Chebyshev distance in KMEANS.
That being said, there is a "coordinate-free" version of K-means (the algorithm)
that does not require a centroid calculation. However, it is not the standard
thing that most people want, and requires a distance matrix rather than raw
data, limiting its use for large datasets (which is what most people use K-means
for). Is that of interest to you?
It's certainly also possible to modify KMEANS to use Chebyshev distance and some
unsuitable centroid calculation, but you're on your own there.
- Peter Perkins
The MathWorks, Inc.
If p is the parameter of the Minkowski distance, then Euclidean
distance corresponds to p=2, Manhattan to p=1 and Chebyshev to
P=infinity.
I base myself on a paper which compares clustering with Euclidean,
Manhattan and Chenyshev distances for clustering using SOM (that why
I thought of comparing Chebyyshev with sqEuclidean). In a book by
Webb, Chebyshev is defined as:
d(x,y) = sum(abs(x_i - y_i))
Starting from this, I don't understand why it could not simply
replace the standard sqEuclidean distance.
Thanks in advance for your help.
Regards.
What's the formula to compute the centroid? For example, the centroid for city
block distance is NOT the componentwise arithmetic mean, it's the componentwise
median. If you can compute an appropriate centroid easily for the Chebyshev
distance, then you're on your way. I haven't thought about this at all, and
there may an obvious answer for the centroid.
K-means partitions your data. The criterion for the partition is that the sum
of the point-to-centroid distances is minimized over all partitions. It makes
little or no sense to find a partition that minimizes that criterion if the
centroid itself is not the point that minimizes the point-to-centroid distances
within a cluster.
I need this "coordinate-free" version of kmeans! Where can i
get it?