Ben Goertzel
unread,Jul 7, 2017, 12:46:42 AM7/7/17Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Andres Suarez, Curtis M. Faith, Linas Vepstas, 练睿婷, opencog
Andres,
(and also Linas and anyone else interested...)
I have refreshed my memory on clustering for unsupervised POS
learning... this was the approach I had fiddled with long ago,
http://www.cs.rhul.ac.uk/home/alexc/papers/eacl2003.pdf
https://github.com/ninjin/clark_pos_induction
I note that Spitkovsky (at Google) uses a similar method in his more
recent work on unsupervised part of speech learning,
https://web.stanford.edu/~jurafsky/goldtags.pdf
These guys are doing clustering on sparse vectors derived via
co-occurrence of various sorts -- they're not using
dimension-reduction; though Spitkovsky is doing some dependency
parsing...
This is at bottom just EM clustering, but it's used in a way that's
nicely customized for part of speech induction...
This paper finds that Fuzzy C-Means outperforms EM on classifying
word2vec output vectors, in a somewhat different context:
http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2016-11.pdf
-- Ben
--
Ben Goertzel, PhD
http://goertzel.org
"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin