Centroid of SV clustering

26 views
Skip to first unread message

yi

unread,
Jan 5, 2012, 11:39:58 PM1/5/12
to Semantic Vectors
Hi. I am new to this group.
I have a question about SV clustering. So will the package generate
the centroid for each cluster after k-mean clustering finished? If so,
how to get it?
If not, what to do?

Thanks,
Y

Dominic

unread,
Jan 6, 2012, 2:06:49 PM1/6/12
to Semantic Vectors
Hi Yi,

That's a good question.

Yes, the package generates the centroids as part of clustering, within
the kMeansCluster method of ClusterResults (see
http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/pitt/search/semanticvectors/ClusterResults.html).
But they're internal to the method, and are not passed back out at
all, so there would need to be some changes for you to get it.

The simplest thing I can think of would be to allocate the array of
centroids outside the method and pass it in as a reference, and then
the caller will find it populated. This would be pretty normal in C/C+
+, but I guess a better encapsulation in Java would be to make a small
internal class / struct called "ClusterOutput" or something like that,
which would contain both the assignment of items to clusters (as is
done at present), and the corresponding centroids.

The main question I have is what exactly do you want to do with the
centroids? If I understand your goals a bit better I could judge
whether the outline above will meet your needs.

Best wishes,
Dominic

Yi Sun

unread,
Jan 6, 2012, 10:44:34 PM1/6/12
to semanti...@googlegroups.com
Thanks for your explanation. What I am really trying to see is if the
centroid is (or the closest doc to the centroid) really representative
in the original doc domain. So we can have a good judgement whether
it makes sense if two centroids should be separated or not. However,
this may need to keep the order of a doc in both original domain and
in the semantic vector domain(which is the input for K-means
clustering).

> --
> You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
> To post to this group, send email to semanti...@googlegroups.com.
> To unsubscribe from this group, send email to semanticvecto...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/semanticvectors?hl=en.
>

Dominic Widdows

unread,
Jan 9, 2012, 1:11:34 PM1/9/12
to semanti...@googlegroups.com
That makes sense, I think.

So you would like to use the centroids as query vectors to search for
documents related to the centroids generated by term clustering - is
that correct?

The simplest way might be to output the centroids to their own vector
store file, and then use this as the -queryvectorfile argument when
searching for related documents, as described in
http://code.google.com/p/semanticvectors/wiki/DocumentSearch. I think
I could help get something working for that pretty quickly.

Searching against the original elemental document vectors would
require a bit more work, to output these vectors to a vector store of
their own. This can also be done, it will just take a little longer.

When do you need this by? I presume "the sooner the better" is always
the answer, but if I can get the first part done and released by
Friday this week, is that early enough to be useful to you?

(Apologies for the delayed reply, I'm still in the throes of moving!)

Best wishes,
Dominic

Dominic

unread,
Jan 16, 2012, 12:55:34 PM1/16/12
to Semantic Vectors
Quick follow up for the list: code for outputting the centroids to a
vector file called cluster_centroids.bin is checked in. I'd like to do
better testing before releasing, but in the meantime feel free to try
this out and see if the centroids are useful with your favorite
models.

Best wishes,
Dominic

On Jan 9, 10:11 am, Dominic Widdows <dwidd...@gmail.com> wrote:
> That makes sense, I think.
>
> So you would like to use the centroids as query vectors to search for
> documents related to the centroids generated by term clustering - is
> that correct?
>
> The simplest way might be to output the centroids to their own vector
> store file, and then use this as the -queryvectorfile argument when
> searching for related documents, as described inhttp://code.google.com/p/semanticvectors/wiki/DocumentSearch. I think
> I could help get something working for that pretty quickly.
>
> Searching against the original elemental document vectors would
> require a bit more work, to output these vectors to a vector store of
> their own. This can also be done, it will just take a little longer.
>
> When do you need this by? I presume "the sooner the better" is always
> the answer, but if I can get the first part done and released by
> Friday this week, is that early enough to be useful to you?
>
> (Apologies for the delayed reply, I'm still in the throes of moving!)
>
> Best wishes,
> Dominic
>
>
>
> On Fri, Jan 6, 2012 at 7:44 PM, Yi Sun <yi.sun.ch...@gmail.com> wrote:
> > Thanks for your explanation. What I am really trying to see is if the
> > centroid is (or the closest doc to the centroid) really representative
> > in the original doc domain. So  we can have a good judgement whether
> > it makes sense if two centroids should be separated or not. However,
> > this may need to keep the order of a doc in both original domain and
> > in the semantic vector domain(which is the input for K-means
> > clustering).
>
> > On Fri, Jan 6, 2012 at 11:06 AM, Dominic <dwidd...@gmail.com> wrote:
> >> Hi Yi,
>
> >> That's a good question.
>
> >> Yes, the package generates the centroids as part of clustering, within
> >> the kMeansCluster method of ClusterResults (see
> >>http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/pitt/...).
> >> For more options, visit this group athttp://groups.google.com/group/semanticvectors?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
> > To post to this group, send email to semanti...@googlegroups.com.
> > To unsubscribe from this group, send email to semanticvecto...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/semanticvectors?hl=en.- Hide quoted text -
>
> - Show quoted text -

Yi Sun

unread,
Jan 16, 2012, 1:38:30 PM1/16/12
to semanti...@googlegroups.com
Yes.I 'll try it when I get a time. One question, do I need to deploy
the new code to replace that deployed in my current machine? Any wiki
showing how?

Thanks,
Yi

Dominic

unread,
Jan 16, 2012, 2:01:10 PM1/16/12
to Semantic Vectors
Here goes - I've just added some brief instructions:

http://code.google.com/p/semanticvectors/wiki/InstallationInstructions?ts=1326740333&updated=InstallationInstructions#Compiling_from_Source_-_Most_Recent_Development_Installation

If this is all new to you let me know - I can make a new release this
evening, it won't be polished but it shouldn't be buggy either.

Best wishes,
Dominic
> >> > For more options, visit this group athttp://groups.google.com/group/semanticvectors?hl=en.-Hide quoted text -
Reply all
Reply to author
Forward
0 new messages