Cosine similarity, PCA, sheaves (algebraic topology)

119 views
Skip to first unread message

Linas Vepstas

unread,
Jun 19, 2017, 3:30:35 AM6/19/17
to Ben Goertzel, opencog, link-grammar, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie)
Hi Ben,

Here's this week's update on results from the natural language datasets. In short, the datasets seem to be of high quality, based on a sampling of the cosine similarity between words. Looks really nice.

Naive PCA stinks as a classifier; I'm looking for something nicer, and perhaps based on first principles, and a bit less ad-hoc.

Since you had the guts to use the words "algebraic topology" in a recent email, I call your bluff and raise: this report includes a brief, short sketch that points out that every language, natural or otherwise, has an associated cohomology theory. The path, from here to there, goes by means of sheaves. Which is semi-obvious because every book on algebraic topology or at least differential topology explains the steps.

The part that's new, to me, was the sudden realization that the "disjuncts" and "connector sets" of Link Grammar are in fact just the sheaves (germs, stalks) of a graph.  The Link Grammar dictionary, say, for the English language, is a sheaf with a probability distribution on it.

BTW, this clarifies why Link Grammar looks so damned modal-logic-ish. I noticed this a long ago, and always thought it was mysterious and weird and interesting. Well, it turns out that, for some folks, this is old news: apparently, when the language is first-order logic, then the sheafification of first-order logic gives you Kripke-Joyal semantics; this was spotted in 1965.  So I'm guessing that this is generic: take any language, any formal language, or a natural language, look at it from the point of sheaves, and then observe that the gluing axioms mean that modal logic describes how the sections glue together.  I think that's pretty cool.

So, can you find a grad student to work out the details? The thesis title would be "the Cohomology of the English Language". It would fill in all the details in the above paragraphs.

--linas
cosine.pdf

Ben Goertzel

unread,
Jun 19, 2017, 4:31:56 AM6/19/17
to link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie), Andres Suarez
Interesting! I will read it through tomorrow, the rest of my today
seems eaten by other stuff...

I am not surprised that PCA stinks as a classifier...

Regarding "hidden multivariate logistic regression", as you hint at
the end of your document ... it seems you are gradually inching toward
my suggestion of using neural nets here...

My most recent suggestion to Ruiting has been to explore the following
code/algorithm, which looks like a nicer way of finding word2vec
style condensed representations for multiple senses of words,

https://arxiv.org/pdf/1502.07257.pdf

The only code I could find for this is in Julia

https://github.com/sbos/AdaGram.jl

but looks not that complicated.... We would need to modify that
Julia code to work on the data from the MST parses rather than on word
sequences like it now does...

However, we haven't gotten to experimenting with that yet, because are
still getting stuck with weird Guile problems in trying to get the MST
parsing done ... we (Curtis) can get through MST-parsing maybe
800-1500 sentences before it crashes (and it doesn't crash when
examined with GDB, which is frustrating...)....

-- Ben
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to link-grammar...@googlegroups.com.
> To post to this group, send email to link-g...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Linas Vepstas

unread,
Jun 19, 2017, 5:15:59 AM6/19/17
to opencog, Curtis M. Faith, Ruiting Lian, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie), Andres Suarez, link-grammar
On Mon, Jun 19, 2017 at 3:31 AM, Ben Goertzel <b...@goertzel.org> wrote:

Regarding "hidden multivariate logistic regression", as you hint at
the end of your document ... it seems you are gradually inching toward
my suggestion of using neural nets here...

Maybe. I want to understand the data first, before I start applying random algorithms to it. BTW, the previous report, showing graphs and distributions of various sorts: its been expanded and cleaned up with lots of new stuff. Nothing terribly exciting.  I can send the current version, if you care.

However, we haven't gotten to experimenting with that yet, because are
still getting stuck with weird Guile problems in trying to get the MST
parsing done ... we (Curtis) can get through MST-parsing maybe
800-1500 sentences before it crashes (and it doesn't crash when
examined with GDB, which is frustrating...)....

Arghhh. OK, I just now merged one more tweak to the text-ingestion that might allow you to progress.  Some back-story:

Back-when, when Curtis was complaining about the large amount of CPU time spent in garbage collection, that is because the script *manually* triggered a GC after each sentence. I presume that Curtis was not aware of this. Now he is.

The reason for doing this was that, without it, mem usage would blow up: link-grammar was returning these strings that were 10 or 20MBytes long and the GC was perfectly happy in letting these clog up RAM. That works out to a gigabyte every 50 or 100 sentences, so I was forcing GC to run pretty much constantly: maybe a few times a second.

This appears to have exposed an obscure guile bug. Each of those giant strings contains scheme code, which guile interprets/compiles and then runs. It appears that high-frequency GC pulls the rug out from under the compiler/interpreter, leading to a weird hang. I think I know how to turn this into a simple test case, but haven't yet.

Avoiding the high-frequency GC avoids the weird hang.  And that's what the last few github merges do. Basically, it checks, after every sentence, if RAM usage is above 750MBytes, and then forces a GC if it is.  This is enough to keep RAM usage low, while still avoiding the other ills and diseases.

For me, its been running for over a week without any problems. It runs at about a few sentences per second. Not sure, its not something I measure. So pretty slow, but I kind-of don't care. because after a week, its 20 or 40 million observations of words, which is plenty enough for me. Too much, actually, the datasets get too big, and I need to trim them.

This has no effect at all on new, unmerged Curtis code. It won't fix his crash. Its only for the existing pipeline.  So set it running on some other machine, and while Curtis debugs, you'll at least get some data piling up.  Run it stock, straight out of the box, don't tune it or tweak it, and it should work fine.

--linas

Linas Vepstas

unread,
Jun 19, 2017, 5:52:27 AM6/19/17
to Enzo Fenoglio (efenogli), opencog, Curtis M. Faith, Ruiting Lian, Hugo Latapie (hlatapie), Andres Suarez, link-grammar
Thanks.  The point of this is that we're not using n-grams for anything.  We're using sheaves. So any algo that has "gram" in it's name is immediately disqualified. The bet is that by doing grammar correctly, using sheaves, will get you much much better results than using n-grams. And that's the point.

--linas

On Mon, Jun 19, 2017 at 4:33 AM, Enzo Fenoglio (efenogli) <efen...@cisco.com> wrote:

Hi Linas

Nice working with you guys on interesting stuff.

 

PCA  is a linear classifier not suited for this kind of problems. I strongly suggest moving definitely to ANN

 

About Adagram this is an implementation for python https://github.com/lopuhin/python-adagram   of the original Julia implementation posted by Ben.  Or you may have a look at Sensegram http://aclweb.org/anthology/W/W16/W16-1620.pdf with code  https://github.com/tudarmstadt-lt/sensegram . I am not aware of ANN for Adagram but there are plenty for skipgram, for example https://keras.io/preprocessing/sequence/#skipgrams

 

bye

e

Ben Goertzel

unread,
Jun 19, 2017, 5:59:11 AM6/19/17
to link-grammar, Enzo Fenoglio (efenogli), opencog, Curtis M. Faith, Ruiting Lian, Hugo Latapie (hlatapie), Andres Suarez
The python version of Adagram seems not complete and not tested, so I
think I'd rather deal with the Julia implementation at this point....
Julia is not that complicated, and I don't love python anyway...

Regarding the deficiencies of n-grams, I agree with Linas. However,
my suggestion is to modify Adagram to use inputs obtained from parse
trees (the MST information theory based parse trees that Linas's code
produces) rather than simply from word-sequences. So then it will be
a non-gram-based Adagram.... The "gram" part of Adagram is not what
interests me; what interests me is the ability of that particular NN
architecture/algorithm to make word2vec style dimensionally-reduced
vectors in a way that automagically carries out word-sense
disambiguation along the way.... I believe it can do this same trick
if fed features from the MST parses, rather than being fed n-gram-ish
features from raw word sequences.... But this will require some code
changes and experimentation...

-- Ben

Ben Goertzel

unread,
Jun 19, 2017, 10:01:43 AM6/19/17
to link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie)
Hi Linas,

I have read the report now...

Looking at the cosine similarity results, it seems clear the corpus
you're using is way too small for the purpose (there's no good reason
"He" and "There" should have such high cosine similarity..., cf table
on page 6)

Also, cosine similarity is known to be fluky for this sort of
application. One will get much less fluky pairwise similarities using
a modern dimension reduction technique like word2vec (but using it on
feature vectors produced from the MST parses, rather than just from
word sequences).... However, word2vec does not handle word sense
disambiguation, which is why I've suggested Adagram (but again,
modified to use feature vectors produced from the MST parses...)

Basically what I am thinking to explore is

-- Adagram on MST parse based feature vectors, to produce
reduced-dimension vectors for word-senses

-- Cluster these reduced-dimension vectors to form word-categories
(not sure what clustering algorithm to use here, could be EM I guess,
or agglomerative as you've suggested... but the point is clustering is
easier on these dimension-reduced vectors because the similarity
degrees are less fluky...)

-- Tag the corpus using these word categories and do the MI analysis
and MST parsing again ...

I also think we might get better MST parses if we used asymmetric
relative entropy instead of symmetric mutual information. If you're
not motivated to experiment with this may be we will try it ourselves
in HK...

-- Ben

Linas Vepstas

unread,
Jun 19, 2017, 12:08:14 PM6/19/17
to link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie)
Hi Ben,

On Mon, Jun 19, 2017 at 9:01 AM, Ben Goertzel <b...@goertzel.org> wrote:
Hi Linas,

I have read the report now...

Looking at the cosine similarity results, it seems clear the corpus
you're using is way too small for the purpose (there's no good reason
"He" and "There" should have such high cosine similarity..., cf table
on page 6)
 
Well, yes, exactly, but I think you missed the point: The reason these are both capitalized is because they both start sentences, and there are simply not that many sentences that start with "He" and "There".  The similarity for "he" and "there" is much lower.

Assuming sentences have 20 words in them, the capitalized-word corpus is 20x smaller than the non-capitalized corpus.   That's why I moved on to the non-capitalized results.

And that's what I mean: the cosine similarity was judged higher  only because there are 20x fewer observations of capitalized words!  We don't need/want a measure that reports high similarity whenever there are fewer observations!

Also, cosine similarity is known to be fluky for this sort of
application.  One will get much less fluky pairwise similarities using
a modern dimension reduction technique like word2vec (but using it on
feature vectors produced from the MST parses, rather than just from
word sequences)....  However, word2vec does not handle word sense
disambiguation, which is why I've suggested Adagram (but again,
modified to use feature vectors produced from the MST parses...)

Basically what I am thinking to explore is

-- Adagram on MST parse based feature vectors, to produce
reduced-dimension vectors for word-senses

-- Cluster these reduced-dimension vectors to form word-categories
(not sure what clustering algorithm to use here, could be EM I guess,
or agglomerative as you've suggested... but the point is clustering is
easier on these dimension-reduced vectors because the similarity
degrees are less fluky...)

OK, so we are mis-communicating, misunderstanding each-other.  I think the cosine data, for the NON-CAPITALIZED words, is good enough to do clustering on.

I was trying to use a variant of PCA for CLUSTERING! and NOT for similarity!   I've already got similarity: the PCA was being applied to the cosine similarity!

It would be nice to have a better similarity than cosine, and maybe adagram can provide this. But that is not where the action is.  I am ready to cluster NOW; I've been ready for weeks, for a month, and I am searching for a high-performance, accurate clustering algo that is less ad-hoc than k-means or agglomerative, or whatever.

Thus, the cryptic note about "hidden multivariate logistic regression" is about doing that for clustering!! 

In short, clustering is where we're at; better similarity scores would be nice, but very much of secondary importance.


-- Tag the corpus using these word categories and do the MI analysis
and MST parsing again ...

Well, once you've tagged, its not an MST parse, its an LG parse.  

I also think we might get better MST parses if we used asymmetric
relative entropy instead of symmetric mutual information.  If you're
not motivated to experiment with this may be we will try it ourselves
in HK...

Yes, I want to try that, but got distracted by other things. It might be nice to get "better MST parses", but right now, we don't have any evidence that they're bad.  They seem to be rather reasonable quality, to me, and this is attested by the fact that the cosine similarity for the NON-CAPITALIZED words seems pretty good!

So again, this is not where the action is.  What we need is accurate, high-performance, non-ad-hoc clustering.  I guess I'm ready to accept agglomerative clustering, if there's nothing else that's simpler, better.

Once clustering is done, I want to move on to morphology, so that I can do e.g. French, any of the romance or slavic languages.  Its in morphology where things like asymmetric relative entropy should really start kicking butt.  I mean, it would be nice for English, but it seems like a lower priority.   These other fires are a lot more urgent.

--linas


> To post to this group, send email to link-g...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar+unsubscribe@googlegroups.com.

Ben Goertzel

unread,
Jun 19, 2017, 12:24:00 PM6/19/17
to link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie)
On Tue, Jun 20, 2017 at 12:07 AM, Linas Vepstas <linasv...@gmail.com> wrote:
> So again, this is not where the action is. What we need is accurate,
> high-performance, non-ad-hoc clustering. I guess I'm ready to accept
> agglomerative clustering, if there's nothing else that's simpler, better.


We don't need just clustering, we need clustering together with sense
disambiguation...

I believe that we will get better clustering (and better
clustering-coupled-with-disambiguation) results out of the vectors
Adagram produces, than out of the sparse vectors you're now trying to
cluster.... But this is an empirical issue, we can try both and
see...

As for the corpus size, I mean, in a bigger corpus "He" and "There"
(with caps) would also not come out as so similar....

But yes, the list of "very similar word pairs" you give is cool and
impressive....

It would be interesting to try EM clustering, or maybe a variant like this,

https://cran.r-project.org/web/packages/HDclassif/index.html

on your feature vectors ....

We will try this on features we export ourselves, it if we can get the
language learning pipeline working correctly.... (I know we could
just take the feature vectors you have produced and play with them,
but I would really like us to be able to get the language learning
pipeline working adequately in Hong Kong -- obviously, as you know,
this is an important project and we can't have it in "it works on my
machine" status ...)

I would like to try EM and variants on both your raw feature vectors,
and on reduced/disambiguated feature vectors that modified-Adagram
spits out based on your MST parse trees.... It will be interesting
to compare the clusters obtained from these two approaches...

-- Ben

Linas Vepstas

unread,
Jun 19, 2017, 1:24:50 PM6/19/17
to link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie)
OK, well, some quick comments:

-- sparsity is a good thing, not a bad thing.  It's one of the big indicators that we're on the right track: instead of seeing that everything is like everything else, we're seeing that only one of of every 2^15 or 2^16 possibilities are actually being observed!  So that's very very good! The sparser, the better!  Seriously, this alone is a major achievement, I think.

-- The reason I was trumpeting about hooking up EvaluationLinks to R was precisely because this opens up many avenues about data analysis. Right now, the data is trapped in the atomspace, and its a lot of work, for me, to get it out, to get it to where I can apply interesting algorithms to it. 

(Personally, I have no plans to do anything with R. Just that making this hookup is the right thing to do, in principle.)

The urgent problem for me is not that I'm lacking algorithms; the problem for me is that I don't have any easy, effective, quick way of applying the algos to the data.  There's no jupyter notebook where you punch the monkey and your data is analyzed. This is where all my time, all the heavy lifting is going.

-- Don't get hung up on point samples.

 "He was going to..."  "There was going to..."

There was a tool house, plenty of
There isn’t any half
There is more, the sky is
There was almost no breeze.
There he had
There wasn’t a thing said about comin’
There was a light in the kitchen, but Mrs.
There was a rasping brush against the tall, dry swamp
There was a hasty consultation, and this program was
There was a bob of the flat boat
There was time only for
There was a crash,
There was the low hum of propellers, and the whirr of the
There was no rear entrance leading
There was a final quick dash down the gully road,
There came a time when the
There ye said it.
There came to him faintly the sound of a voice
There may be and probably is some exaggeration
There he took a beautiful little mulatto slave as his
There flew the severed hand and dripped the bleeding heart.
There must sometimes be a physical
There remains then a kind of life of
There are principally three things moving us to choice and three


He had not yet seen the valuable
He was slowly
He may be able to do what you want, and he may not. You may
He lit a cigar, looked at his watch, examined Bud in the
He was heating and
He stammered
He looked from one lad to the other.
He answered angrily in the same language.
He was restless and irritable, and every now
He had passed beyond the
He was at least three hundred
He could not even make out the lines of the fences beneath
He had thoughtlessly started in
He was over a field of corn in the shock.
He had surely gone a mile! In the still night air came a
He fancied he heard the soft lap of water just ahead. That
He had slept late, worn
He was a small man, with
He ain’t no gypsy, an’ he ain’t no
He was dead, too, then. The place was yours because
He knew he had enough fuel to carry
He meant to return to the fair, give the advertised exhibition
He returned to his waiting friends


Linas Vepstas

unread,
Jun 19, 2017, 3:25:12 PM6/19/17
to Hugo Latapie (hlatapie), link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli)
Again, there's a misunderstanding here. Yes, PCA is not composable, sheaves are. i'm using sheaves. The reason that I looked at PCA was to use a thresholded, sparse PCA for CLUSTERING. and NOT similarity where compositionality does not matter. Its really a completely different concept, quite totally unrelated, which just happens to have the three letters PCA in it. Perhaps I should have called it a "sigmoid-thresholded eigenvector classifier" instead, because that's what I'm trying to talk about.

--linas

On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <hlat...@cisco.com> wrote:

Hi Everyone… I have a lot of ramping-up to do here.

 

Following this interesting thread, initially thinking about optimal clustering of various distributed representations led me to this paper:

Ferrone, Lorenzo, and Fabio Massimo Zanzotto. "Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey." arXiv preprint arXiv:1702.00764 (2017).

 

Which emphasized the importance of semantic composability, as we were discussing Ben. They also show that PCA are not composable in this sense. They show random indexing solves some of these problems when compacting distributional semantic vectors.

 

Holographic reduced representations look promising.

 

BTW if we can help with some of the grunge work, creating that Jupyter notebook (or suitable equivalent), Karthik may be able to help. Of course with your guidance.

 

Cheers,

 

Hugo

 

From: Linas Vepstas [mailto:linasv...@gmail.com]

Linas Vepstas

unread,
Jun 19, 2017, 4:10:41 PM6/19/17
to Hugo Latapie (hlatapie), link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli)
On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <hlat...@cisco.com> wrote:
arXiv:1702.00764


I've just barely started reading that, and from the very beginning, its eminently clear how even the latest, leading research on deep neural nets is  profoundly ignorant of grammar and semantics. Which I think is another reason why the direction we're on is so promising: apparently, just about exactly zero of the researchers in  one area are aware of the theory and results of the other. 

Which I guess is a good thing for me, But its really really hard to read that paper, and not want to scream at the top of my lungs, "those ding-a-lings, don't they know about result xyz? what's wrong with them? Are they all ignorami?" and yet, it seems to be a giant pyramid of results built on results demonstrating a lack of knowledge, education about form and structure.  So it's a bit hard to take seriously, and yet, everyone who is interested in deep learning seems to be doing just that...  its remarkable.

--linas

Linas Vepstas

unread,
Jun 19, 2017, 5:03:16 PM6/19/17
to Hugo Latapie (hlatapie), link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris, Enzo Fenoglio (efenogli)


On Mon, Jun 19, 2017 at 3:26 PM, Hugo Latapie (hlatapie) <hlat...@cisco.com> wrote:

Thanks Linus. The approach here does look extremely promising.

 

Bridging the gap between these various camps is the holy grail that few are even searching for much less attempting to implement.


Thanks, But its not that I have some brilliant insight; its just that I have a minor insight, and prefer to actually experiment, and see if it works, rather than theorize in some grand fashion.  Theories and insights often seem grand when one is in the shower, but if they won't work, I prefer to find out ASAP, rather than to continue imagining how great they are.

--linas
 

 

-Hugo

 

 

From: Linas Vepstas [mailto:linasv...@gmail.com]
Sent: Monday, June 19, 2017 1:10 PM
To: Hugo Latapie (hlatapie) <hlat...@cisco.com>
Cc: link-grammar <link-g...@googlegroups.com>; opencog <ope...@googlegroups.com>; Ruiting Lian <rui...@hansonrobotics.com>; Word Grammar <WORDG...@jiscmail.ac.uk>; Zarathustra Goertzel <zar...@gmail.com>; Hugo deGaris <profhug...@yahoo.com>; Enzo Fenoglio (efenogli) <efen...@cisco.com>
Subject: Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic topology)

 

 

On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <hlat...@cisco.com> wrote:

arXiv:1702.00764

 

Linas Vepstas

unread,
Jun 19, 2017, 6:00:14 PM6/19/17
to Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie), link-grammar, opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris
Hi Enzo,

On Mon, Jun 19, 2017 at 3:49 PM, Enzo Fenoglio (efenogli) <efen...@cisco.com> wrote:

 

A “sigmoid-thresholded eigenvector classifier” is just a single layer autoencoder with sigmoid activation. That’s equivalent to performing PCA as you did. But  if you had used  a stacked autoencoder (=adding more layers and probably a reLu activation) you will simply get  better clustering.


Yes, that's right, and the general sketch of this was sent out the week before last.  So here, I tried to just look at the first layer, and see how that went.

But, in doing so, I convinced myself that this is perhaps not all that good an idea to begin with, and that stacking layers doesn't solve the underlying problem.  I'm struggling to put put my finger on it, but basically, it seems that, since I already know that pairs of things are similar, then having a whole network of things, reinforcing one another, is in fact blurring the picture. Its as if having a large number of dis-similar things are just pulling things down. 

I'm guessing that strong(er) thresholding might solve this, adding layers would solve this, but it then begs the question: why bother with all this extra complexity, if I've already mostly got what I want?  So, yes, I could try this experiment, but its not clear that it's worth the effort.  I suspect there's some alternate way of reworking this, but I haven't figured it out.  Maybe putting the autoencoder in a different location, for example, before the vectors got generated, instead of after.  Pre-filtering them, perhaps.  Not sure.
 
 

It is even possible to train latent variable models with a variant of EM algorithm which alternate between Expectation and Maximization, but we usually prefer to train with SGD.

If interested there is code and ipython available.

 

But  if you need WSD, here a recent paper  https://arxiv.org/pdf/1606.03568.pdf   using  Bidirectional LSTM to learn context , or this from Stanford https://web.stanford.edu/class/cs224n/reports/2762042.pdf using skip-gram + LSTM. Last you may be interested to this extension of vec2word to disambiguation called sense2vec  https://arxiv.org/pdf/1511.06388.pdf . So the DL community is at least trying to do something of interesting in the NLP field… but it is not enough as you can readily see..


I've had really pretty good success with WSD in the past, and demonstrated a really pretty nice coupling/correlation between word-senses from wordnet and LG disjuncts. Simply having the LG disjunct gets you the correct wordnet sense about 70% of the time, and this is achievable in milliseconds of CPU time (ordinary CPU's, not GPU's). Its not that the score is so great -- back then, people could get up to 80 or 85% correct, but that took minutes or more of CPU time, not milliseconds. I did this work in 2008.

Based on this, the way forward seemed clear: use basic syntactic structure as one component to meaning and reference resolution, and use other algorithms, such as logical reasoning, to go the rest of the way. Much of the needed reasoning is really pretty straight-forward; yet, here I am, almost a decade later, and still don't have a functional reasoner to work with.  :-(  Ben, meanwhile, keeps trying to go in a completely different direction, so I get to  wait. One step at a time. 

 

So, I tend to agree with you that “just about exactly zero of the researchers in  one area are aware of the theory and results of the other”. And I really convinced that unsupervised grammar induction,  is what we need at Cisco for our networking problems that cannot just be solved with “ad hoc” DL networks (=lack of scalability).  I am looking forward to sharing with you guys some of our “impossible networking problems” 


Well, I'm aware that you have "impossible networking problems", but, at the moment, I have no clue of what they are, or why grammar+semantics is a reasonable approach.

I've been working on a different problem: trying to induce grammar automatically, on different languages, while I wait for a functioning reasoner. And actually that's fine, as we already have evidence that a reasoner on hand-crafted data will not .. well, its a long convoluted story.  So working on grammar induction at this point seems like the right stepping stone.
 

, and see how  your grammar+semantic approach will be effective (adding somehow a non-linear embedding in the phase space as I already discussed with Ben)


Ben has not yet relayed this to me.

-- Linas

 

e

 

 

From: Linas Vepstas [mailto:linasv...@gmail.com]

Sent: lundi 19 juin 2017 21:25
To: Hugo Latapie (hlatapie) <hlat...@cisco.com>

Ben Goertzel

unread,
Jun 19, 2017, 10:56:29 PM6/19/17
to link-grammar, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie), opencog, Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris
On Tue, Jun 20, 2017 at 5:59 AM, Linas Vepstas <linasv...@gmail.com> wrote:
>> , and see how your grammar+semantic approach will be effective (adding
>> somehow a non-linear embedding in the phase space as I already discussed
>> with Ben)
>
>
> Ben has not yet relayed this to me.
>
> -- Linas

Yeah, it seemed you were already pretty busy!

The short summary is: For any complex dynamical system, if one embeds
the system's states in a K-dimensional space appropriately, and then
divides the relevant region of that K-dimensional space into discrete
cells... then each trajectory of that system becomes a series of
"words" in a certain language (where each of the discrete cells
corresponds to a word)... I guess you are probably familiar with
this technique, which is "symbolic dynamics"

One can then characterize a dynamical system, in various ways, via the
inferred grammar of this "symbolic-dynamical language" ...

I did work on this a couple decades ago using various Markovian
grammar inference tools I hacked myself...

Enzo at Cisco, as it turns out, had been thinking about applying
similar methods to characterize the complex dynamics of some Cisco
networks...

So we have been discussing this as an interesting application of the
OpenCog-based grammar inference tools we're now developing ...

There's plenty more, but that's the high-level summary...

(Part of the "plenty more" is that there may be a use of deep (or
shallow, depending on the case) neural networks to help with the
initial stage where one embeds the complex system's states in a
K-dimensional space. In a different context, word2vec and adagram are
examples of the power of modern NNs for dimensional embedding.)

Linas Vepstas

unread,
Jun 20, 2017, 9:37:41 AM6/20/17
to opencog, link-grammar, Enzo Fenoglio (efenogli), Hugo Latapie (hlatapie), Ruiting Lian, Word Grammar, Zarathustra Goertzel, Hugo deGaris
Yeah, I know symbolic dynamics pretty well. I think I wrote most of the wikipedia article on "subshifts of finite type" and the rainbow of related topics - the product topology, the cylinder sets,  e.g. most of "measure-preserving dynamical system"  There's a vast network of related topics, and they're all interesting.

--linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBeYGVPZJo3OeV3sajuPgaosg9nbBiurttsVU%3Dz23pSg7w%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages