Clustering for unsupervised POS learning

33 views
Skip to first unread message

Ben Goertzel

unread,
Jul 7, 2017, 12:46:42 AM7/7/17
to Andres Suarez, Curtis M. Faith, Linas Vepstas, 练睿婷, opencog
Andres,

(and also Linas and anyone else interested...)

I have refreshed my memory on clustering for unsupervised POS
learning... this was the approach I had fiddled with long ago,

http://www.cs.rhul.ac.uk/home/alexc/papers/eacl2003.pdf

https://github.com/ninjin/clark_pos_induction

I note that Spitkovsky (at Google) uses a similar method in his more
recent work on unsupervised part of speech learning,

https://web.stanford.edu/~jurafsky/goldtags.pdf

These guys are doing clustering on sparse vectors derived via
co-occurrence of various sorts -- they're not using
dimension-reduction; though Spitkovsky is doing some dependency
parsing...

This is at bottom just EM clustering, but it's used in a way that's
nicely customized for part of speech induction...

This paper finds that Fuzzy C-Means outperforms EM on classifying
word2vec output vectors, in a somewhat different context:

http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2016-11.pdf

-- Ben






--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin
Reply all
Reply to author
Forward
0 new messages