My problem is:
For example, result of the first phase i run the k-means code's gives me 3 clusters. And, when i enter the new data set to first system how to organize for the new data according to previous (first culsters) cluster as a codes? or idea?
Thanks,
I don't know if I understand the question. My best guess is that after
finding the initial cluster centroids, you want to check the distance of
each new point to the cluster centroids, and label that point according
to which of the centroids it is closest to. If so, then that should be
relatively straight-forward, and is perhaps even vectorizable (depending
upon the distance measure you choose.)
Yes, you're right!
But, how can i calculate the distances according to new data set?
For example, the first cluster centers (for cluster 1 cluster 2 and cluster 3 are known) should be constant and distances should be recalculated for data coming to new system, right?
And, how is it possible?
> Yes, you're right! But, how can i calculate the distances according to
> new data set? For example, the first cluster centers (for cluster 1
> cluster 2 and cluster 3 are known) should be constant and distances
> should be recalculated for data coming to new system, right?
> And, how is it possible?
Well that's going to depend upon the distance measure you want to use.
Let M be a matrix of features across the rows. Let C1 be a row matrix of
centroid coordinates.
Here's a simple implementation for Euclidean distance.
s1 = zeros(size(M,1),1);
for K = 1:size(M,1)
s1(K) = sum( (M(K,:) - C1) .^ 2 );
end
Repeat for the other centroids, and then compare the S*(K) distances to
determine the closest centroid.
This code makes no attempt to be the most efficient code possible for
the situation: get your basic code working first and only worry about
optimizing it if the simple clear code turns out to be unacceptably slow.
M=dt(:,[1 6])'; % row data M=[2X85]
C1=result.cluster.v(1,:); % constant form 1. phase k-means/ X and Y row data as you said C1=[1X2]
s1 = zeros(size(M,1),1);
for K = 1:size(M,1)
s1(K) = sum((M(K,:)-C1).^ 2 );
end
It doesn't work.
May bei there is a problem in "s1 = zeros(size(M,1),1);"
Your M does not conform to the specifications I indicated, that it be a
an array of *rows* of features. Your M is *columns* of features.
Just remove the ' after the dt(:,[1 6]) to get an M that is appropriate
for the code.
If you have KMEANS, then you must have PDIST2. KMEANS returns the
cluster centroids, and PDIST2 computes distances between two sets of
points. Use the second output of MIN on the distance matrix, and you're
done.