Regarding "hidden multivariate logistic regression", as you hint at
the end of your document ... it seems you are gradually inching toward
my suggestion of using neural nets here...
However, we haven't gotten to experimenting with that yet, because are
still getting stuck with weird Guile problems in trying to get the MST
parsing done ... we (Curtis) can get through MST-parsing maybe
800-1500 sentences before it crashes (and it doesn't crash when
examined with GDB, which is frustrating...)....
Hi Linas
Nice working with you guys on interesting stuff.
PCA is a linear classifier not suited for this kind of problems. I strongly suggest moving definitely to ANN
About Adagram this is an implementation for python https://github.com/lopuhin/python-adagram of the original Julia implementation posted by Ben. Or you may have a look at Sensegram http://aclweb.org/anthology/W/W16/W16-1620.pdf with code https://github.com/tudarmstadt-lt/sensegram . I am not aware of ANN for Adagram but there are plenty for skipgram, for example https://keras.io/preprocessing/sequence/#skipgrams
bye
e
Hi Linas,
I have read the report now...
Looking at the cosine similarity results, it seems clear the corpus
you're using is way too small for the purpose (there's no good reason
"He" and "There" should have such high cosine similarity..., cf table
on page 6)
Also, cosine similarity is known to be fluky for this sort of
application. One will get much less fluky pairwise similarities using
a modern dimension reduction technique like word2vec (but using it on
feature vectors produced from the MST parses, rather than just from
word sequences).... However, word2vec does not handle word sense
disambiguation, which is why I've suggested Adagram (but again,
modified to use feature vectors produced from the MST parses...)
Basically what I am thinking to explore is
-- Adagram on MST parse based feature vectors, to produce
reduced-dimension vectors for word-senses
-- Cluster these reduced-dimension vectors to form word-categories
(not sure what clustering algorithm to use here, could be EM I guess,
or agglomerative as you've suggested... but the point is clustering is
easier on these dimension-reduced vectors because the similarity
degrees are less fluky...)
-- Tag the corpus using these word categories and do the MI analysis
and MST parsing again ...
I also think we might get better MST parses if we used asymmetric
relative entropy instead of symmetric mutual information. If you're
not motivated to experiment with this may be we will try it ourselves
in HK...
> email to link-grammar+unsubscribe@googlegroups.com.
> To post to this group, send email to link-g...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.
--
Ben Goertzel, PhD
http://goertzel.org
"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin
--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar+unsubscribe@googlegroups.com.
Hi Everyone… I have a lot of ramping-up to do here.
Following this interesting thread, initially thinking about optimal clustering of various distributed representations led me to this paper:
Ferrone, Lorenzo, and Fabio Massimo Zanzotto. "Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey." arXiv preprint arXiv:1702.00764 (2017).
Which emphasized the importance of semantic composability, as we were discussing Ben. They also show that PCA are not composable in this sense. They show random indexing solves some of these problems when compacting distributional semantic vectors.
Holographic reduced representations look promising.
BTW if we can help with some of the grunge work, creating that Jupyter notebook (or suitable equivalent), Karthik may be able to help. Of course with your guidance.
Cheers,
Hugo
From: Linas Vepstas [mailto:linasv...@gmail.com]
arXiv:1702.00764
Thanks Linus. The approach here does look extremely promising.
Bridging the gap between these various camps is the holy grail that few are even searching for much less attempting to implement.
-Hugo
From: Linas Vepstas [mailto:linasv...@gmail.com]
Sent: Monday, June 19, 2017 1:10 PM
To: Hugo Latapie (hlatapie) <hlat...@cisco.com>
Cc: link-grammar <link-g...@googlegroups.com>; opencog <ope...@googlegroups.com>; Ruiting Lian <rui...@hansonrobotics.com>; Word Grammar <WORDG...@jiscmail.ac.uk>; Zarathustra Goertzel <zar...@gmail.com>; Hugo deGaris <profhug...@yahoo.com>; Enzo Fenoglio (efenogli) <efen...@cisco.com>
Subject: Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic topology)
On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <hlat...@cisco.com> wrote:
arXiv:1702.00764
A “sigmoid-thresholded eigenvector classifier” is just a single layer autoencoder with sigmoid activation. That’s equivalent to performing PCA as you did. But if you had used a stacked autoencoder (=adding more layers and probably a reLu activation) you will simply get better clustering.
It is even possible to train latent variable models with a variant of EM algorithm which alternate between Expectation and Maximization, but we usually prefer to train with SGD.
If interested there is code and ipython available.
But if you need WSD, here a recent paper https://arxiv.org/pdf/1606.03568.pdf using Bidirectional LSTM to learn context , or this from Stanford https://web.stanford.edu/class/cs224n/reports/2762042.pdf using skip-gram + LSTM. Last you may be interested to this extension of vec2word to disambiguation called sense2vec https://arxiv.org/pdf/1511.06388.pdf . So the DL community is at least trying to do something of interesting in the NLP field… but it is not enough as you can readily see..
So, I tend to agree with you that “just about exactly zero of the researchers in one area are aware of the theory and results of the other”. And I really convinced that unsupervised grammar induction, is what we need at Cisco for our networking problems that cannot just be solved with “ad hoc” DL networks (=lack of scalability). I am looking forward to sharing with you guys some of our “impossible networking problems”
, and see how your grammar+semantic approach will be effective (adding somehow a non-linear embedding in the phase space as I already discussed with Ben)
Sent: lundi 19 juin 2017 21:25
To: Hugo Latapie (hlatapie) <hlat...@cisco.com>
--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBeYGVPZJo3OeV3sajuPgaosg9nbBiurttsVU%3Dz23pSg7w%40mail.gmail.com.