word2vec within openCog language learning?

205 views
Skip to first unread message

Ben Goertzel

unread,
Mar 26, 2017, 12:44:10 PM3/26/17
to opencog, Linas Vepstas, Mas Ben
Linas,

I thought a bit about how to use a modified version of the word2vec
idea in our language learning pipeline...

I'm thinking about the Skip-gram model of word2vec, as summarized
informally e.g. here

http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Following up the suggestion you made in Addis in our chat with
Masresha, I'm thinking to replace the "adjacent word-pairs" used in
word2vec with "word-pairs that are adjacent in the parse tree" (where
e.g. the parse tree may be the max-weight spanning tree in our
language learning algorithm)....

This would still produce a vector just like word2vec does, via the
hidden layer of the NN ... but the vector would likely be more
meaningful than a typical word2vec vector...

What would the purpose of this be, in the context of our language
learning algorithm? The purpose would be that clustering should work
better on the word2vec vectors than on the raw-er data regarding "word
co-occurrence in parse trees." At least, that seems plausible, since
clustering on word2vec vectors generally works better than on
co-occurrence vectors

This would be something that Masresha or someone else in Addis could
work on, I think...

We can discuss at the office this week...

ben


--
Ben Goertzel, PhD
http://goertzel.org

“Our first mothers and fathers … were endowed with intelligence; they
saw and instantly they could see far … they succeeded in knowing all
that there is in the world. When they looked, instantly they saw all
around them, and they contemplated in turn the arch of heaven and the
round face of the earth. … Great was their wisdom …. They were able to
know all....

But the Creator and the Maker did not hear this with pleasure. … ‘Are
they not by nature simple creatures of our making? Must they also be
gods? … What if they do not reproduce and multiply?’

Then the Heart of Heaven blew mist into their eyes, which clouded
their sight as when a mirror is breathed upon. Their eyes were covered
and they could see only what was close, only that was clear to them.”

— Popol Vuh (holy book of the ancient Mayas)

Cassio Pennachin

unread,
Mar 26, 2017, 8:40:22 PM3/26/17
to opencog, Linas Vepstas, Mas Ben
Hi Ben & Linas,

I assume you're familiar with sense2vec. At the one paragraph level of detail, this suggestion seems pretty similar to me, any key differences?  


--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBc3RLChVKehr4DMxbXVbpyMJkyszsKVju-7cy7GpZzBrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Cassio Pennachin

Ben Goertzel

unread,
Mar 26, 2017, 10:13:01 PM3/26/17
to opencog, Linas Vepstas, Mas Ben
sense2vec as typically used, uses human-provided labels for the word
senses of training instances ... see

https://arxiv.org/pdf/1511.06388.pdf

However, it can be used with labels that come from some model as well

The big difference between word2vec, sense2vec etc. and what I am
proposing (based on a conversation with Linas and Masresha and Eyob in
Addis last month) is in how the context is built. We suggest that
the context for the skip-gram model corresponding to a word is drawn
not from textual adjacency in sentences, but rather from adjacency in
parse trees...

However, we suggest to do this NOT from parse trees obtained using a
grammar learned from supervised training or provided by hand-coding;
but rather from parse trees obtained via unsupervised learning.

However, this unsupervised learning process has, as part of its
algorithm, clustering of words into categories. I propose to do this
clustering using word2vec style vectors associated with words.

Thus I suggest to embed "skip-gram word2vec style vector building
based on contexts defined by parse trees" into the unsupervised
grammar learning process, as an intermediate stage intended to support
smarter clustering...

Referring to

https://arxiv.org/pdf/1401.3372.pdf

I am suggesting word2vec style embedding as a tool to assist with the
"grouping" referred to in step 4 in the numbered list at the start of
section 5.1; i.e. in Step 2 of the numbered list occurring after the
bullet list in section 5.2.2, which reads

"2. Cluster words into categories based on the similarity of their
associated usage links"

Now, the distinction between word2vec and sense2vec may indeed be
relevant here, because a word may be meaningfully be placed into two
categories at this stage (i.e. word sense disambiguation). So we may
want to do something like they describe in this paper

https://pdfs.semanticscholar.org/142f/38642629b9d268999ad876af482177d36697.pdf

which is similar to sense2vec but slower to run, but has the advantage
of (unlike sense2vec) not requiring labeled training examples

-- Ben
>> email to opencog+u...@googlegroups.com.
>> To post to this group, send email to ope...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/opencog.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CACYTDBc3RLChVKehr4DMxbXVbpyMJkyszsKVju-7cy7GpZzBrQ%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Cassio Pennachin
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CADwLDSRD9cZ8HCg_%2Be4dw5fmzBB4ZSR_kDCbPeMdvhXWE3BrfA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--

Cassio Pennachin

unread,
Mar 27, 2017, 9:36:29 AM3/27/17
to opencog, Linas Vepstas, Mas Ben
Hi,

However, we suggest to do this NOT from parse trees obtained using a
grammar learned from supervised training or provided by hand-coding;
but rather from parse trees obtained via unsupervised learning.

Ah, this is the bit I was missing. 
 
Referring to

https://arxiv.org/pdf/1401.3372.pdf

I am suggesting word2vec style embedding as a tool to assist with the
"grouping" referred to in step 4 in the numbered list at the start of
section 5.1; i.e. in Step 2 of the numbered list occurring after the
bullet list in section 5.2.2, which reads

"2. Cluster words into categories based on the similarity of their
associated usage links"

Got it, thanks.

Cassio
--
Cassio Pennachin

Ben Goertzel

unread,
Mar 27, 2017, 11:02:46 AM3/27/17
to opencog, Linas Vepstas, Mas Ben
BTW this paper nicely explains how word2vec is essentially doing a
kind of factorization of a matrix of mutual information values...

https://levyomer.files.wordpress.com/2014/09/neural-word-embeddings-as-implicit-matrix-factorization.pdf
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CADwLDSTAwTGHninLdkiExtd-kePKdi_3q2W99OveqkPRo5b1bg%40mail.gmail.com.

Ben Goertzel

unread,
Mar 27, 2017, 11:11:33 AM3/27/17
to opencog, Linas Vepstas, Mas Ben
On Mon, Mar 27, 2017 at 10:12 AM, Ben Goertzel <b...@goertzel.org> wrote:
> Now, the distinction between word2vec and sense2vec may indeed be
> relevant here, because a word may be meaningfully be placed into two
> categories at this stage (i.e. word sense disambiguation). So we may
> want to do something like they describe in this paper
>
> https://pdfs.semanticscholar.org/142f/38642629b9d268999ad876af482177d36697.pdf
>
> which is similar to sense2vec but slower to run, but has the advantage
> of (unlike sense2vec) not requiring labeled training examples

Code corresponding to the above paper is here:

https://github.com/ninjin/huang_et_al_2012

A simpler, faster (but maybe worse) approach to the same problem is
embodied in this code

https://github.com/nishantrai18/cs671project

The latter simpler code would be "easy" to apply to the current
context, because it's based on postprocessing word2vec vectors, so I
suppose one could apply it to postprocessing word2vec-type vectors
obtained from "disjunct vectors" that are produced from ensembles of
parse trees...

-- Ben

Ben Goertzel

unread,
Mar 30, 2017, 10:46:41 AM3/30/17
to MB, opencog, Linas Vepstas
That makes sense as a starting point, indeed...

BTW, Man Hin's earlier experimentation with word2vec code indicated
that a very large amount of training data is probably needed to get
meaningful results...

ben

On Thu, Mar 30, 2017 at 10:43 PM, MB <masresh...@gmail.com> wrote:
> Hi Ben,
>
> I've been looking at the skip-gram model implementation for the contender
> project. Tensorflow seem to have done something on that and it works quite
> well. https://www.tensorflow.org/tutorials/word2vec . Maybe this could be a
> good starting point
>
> Masresha

Jesús López

unread,
Apr 2, 2017, 8:01:16 AM4/2/17
to opencog, linasv...@gmail.com, masresh...@gmail.com

Dear Dr. Goertzel and contributors,


You could also enrich the distributional ideas giving support to compositionality in other way. In your arxiv:1703.04368 you link a pregroup grammar parse tree of a sentence to a morphism in a symmetric monoidal category. In work from Coecke, Clark and others a categorial grammar parse tree is associated to a morphism in the category of linear maps which is monoidal with the good old linear algebra tensor product. This morphism is a tensor network that corresponds naturally with the categorial grammar parse tree, where ground types such as nouns correspond to vectors obtained by a distributional method such as word2vec and compound types of words such as verbs correspond to higher rank tensors. That’s why they call it DisCoCat (distributional, compositional, categorical) model. While theoretically nice I think that computationally is still work in progress from the point of view of getting hands on and start coding, though.


You can browse some slides of talks of Stephen Clark on this here: https://sites.google.com/site/stephenclark609/talks


Warm regards, Jesus Lopez.

Ben Goertzel

unread,
Apr 2, 2017, 8:16:09 AM4/2/17
to opencog, Linas Vepstas, Mas Ben
Yeah, good that you bring that up!

Linas and I have read Coecke's papers on this stuff and discussed them
a few times.... Indeed we are well positioned to explore these sorts
of ideas computationally...

I suppose the value of the morphism you cite in a practical context
would be: If patterns are mined among the syntactic structures of
sentences, they can be mapped into the semantic domain... and vice
versa.... So e.g. if we find X+Y is roughly equal to Z in the domain
of semantic vectors, we can map this back into relations between the
syntactic structures corresponding to X, Y and Z ... and this mapping
may be used to adjust the probabilities of "grammar rules" or suggest
new grammar rules. So in this way semantic patterns could be
morphically mapped back to suggest syntactic patterns...

-- Ben
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/fececc5f-f40e-4cc0-8d0f-9361c5750265%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Linas Vepstas

unread,
Apr 2, 2017, 3:01:25 PM4/2/17
to Ben Goertzel, opencog, Mas Ben
My quick, informal gut-feel sense of this is that the right answer is to replace "vector" part of word2vec by that *actual data structure* that *actually occurs in language".  It's kind of hard to explain how to do this, but let me give it a whirl.

Note how vectors are "symmetric", in the sense that the dot-product of vector A and B is the same as that for B and A.  Now go the the wikipedia article for "pregroup grammar", and note the example about half-way down, talking about left and right inverses. Notice that the left and the right inverses are NOT the same.  Language simply does not have the symmetry properties of vectors.  There are *some* similarities between vectors and language, and that is why tricks like word2vec partly work.   But they break down because they average together things that should not be averaged together.

Another, better, different replacement that kind-of-ish leeps some of the word-2-vec ideas would be to encode portions of the "subgraph isomorphism problem" into vectors.  word2vec uses only the very very simplest subgraphs: the pairs -- whereas we could use more complex graphs than simply the pairs.

How can you "encode" a subgraph that is more complex than a word-pair, and can still be shoved into a vector?  Well, you could, for example, state what nodes in the subgraph are connected to what other nodes. How could one do that .. oh hey... that's the LG disjunct!

So the language learning project *already* contains a word2vec-like stage in it...and its a critical stage of the project, and its the one that Rohit almost reached, a few years back.

--linas
 

Linas Vepstas

unread,
Apr 2, 2017, 3:21:53 PM4/2/17
to Ben Goertzel, Jesús López, opencog, Mas Ben
Hi Ben,

On Sun, Apr 2, 2017 at 3:16 PM, Ben Goertzel <b...@goertzel.org> wrote:
 So e.g. if we find X+Y is roughly equal to Z in the domain
of semantic vectors,

But what Jesus is saying (and what we say in our paper, with all that fiddle-faddle about categories)  is precisely that while the concept of addition is kind-of-ish OK for meanings  it can be even better if replaced with the correct categorial generalization. 

That is, addition -- the plus sign --is a certain speciific morphism, and that this morphism,  the addition of vectors, has the unfortunate property of being commutative, whereas we know that language is non-commutative. The stuff  about pre-group grammars is all about identifying exactly which morphism it is that correctly generalizes the addition morphism.

That addition is kind-of OK is why word2vec kind-of works. But I think we can do better.

Unfortunately, the pressing needs of having to crunch data, and to write the code to crunch that data, prevents me from devoting enough time to this issue for at least a few more weeks or a month. I would very much like to clarify the theoretical situation here, but need to find a chunk of time that isn't taken up by email and various mundane tasks.

--linas

Jesús López

unread,
Apr 7, 2017, 2:24:19 PM4/7/17
to linasv...@gmail.com, Ben Goertzel, opencog, Mas Ben
Hello Ben and Linas,

Sorry for the delay, I was reading the papers. About additivity: In
Coecke's et al. program you turn a sentence into a *multilinear* map
that goes from the vectors of the words having elementary syntactic
category to a semantic vector space, the sentence meaning space. So
yes, there is additivity in each of theese arguments (thing which by
the way should have a consequence in those beautiful word2vec
relations of France - Paris ~= Spain - Madrid, though I haven't seen a
description).

As I understand, your goal is to go from plain text to logical forms
in a probabilistic logic, and you have two stages, parsing from plain
text to a pregroup grammar parse structure (I'm not sure that the
parse trees I spoken before are really trees, hence the change to
'parse structure'), and then you go from that parse structure (via
RelEx and RelEx2Logic if that's ok) to a lambda calculus term bearing
the meaning and having attached extrinsically a kind of probability
and another number.

How do Coecke's program (and from now on that unfairly includes all
the et als.) fit in that picture? I think the key observation is when
Coecke says that his framework can be interpreted, as a particular
case, as Montague semantics. Though adorned by linguistic
considerations this semantic is well known as amenable to computation,
and a toy version is shown in chapter 10 of the NLTK book, where they
show how lambda calculus represents a logic that has a model theory.
That is important because all those lambda terms have to be actual
functions with actual values.

How exactly does Coecke's framework reduces to Montague semantics?
That matters, because if we understand how Montague semantics is a
particular case of Coecke's, we can think in the opposite direction
and see Coecke's semantics as an extension.

As starting point we have the fact that Coecke semantics can be
summarized as a monoidal functor that sends a morphism from a compact
closed category in syntax-land (the pregroup grammar parse structure,
resulting from parsing the plain text of a sentence) to a morphism in
a compact closed category in semantics-land, the category of real
vector spaces, that morphism being a (multi)linear map.

Coecke semantic functor definition, however, hardly needs any
modification if we use as target the compact closed category of
modules over a fixed semiring. If the semiring is that of booleans, we
are talking about the category of relations between sets, with Pierce
relational product (uncle = brother * father) expressed with the same
matrix product formula of linear algebra, and with cartesian product
as the tensor product that makes it monoidal.

The idea is that when Coecke semantic functor has as codomain the
category of relations, one obtains Montague semantics. More exactly,
when one applies the semantic functor to a pregroup grammar parse
structure of a sentence, one obtains the lambda term that Montague
would have attached to it. Naturally the question is how exactly
unfold that abstract notion. The folk joke on 'abstract nonsense'
forgets that there is a down button in the elevator.

Well, this would be lenghty here, but the way I started to come to
grips is by entering into the equation the CCG linguistic formalism. A
fast and good slide show of how one goes from plain text to CCG
derivations, and from derivations then to classic Montague-semantics
lambda terms, can be found in [1].

One important feature in CCG is that it is lexicalized, i. e., all the
linguistic data necessary to do both syntatic and semantic parsing is
attached to the words of the dictionary, in contrast with, say, NLTK
book ch. 10, where the linguistic data is inside production rules of
an explicit grammar.

Looking closer to the lexicon (dictionary), one has that each word is
supplemented with its syntactic category (N/N...) and also with a
lambda term compatible with the syntactic category used in semantic
parsing. Those lambda terms are not magical letters. For the lambda
terms to have a true model theoretic semantics they must correspond to
specific functions.

The good thing is that the work of porting Coecke semantics to CCG
(instead of pregroup grammar) is already done: in [2]. The details are
there, but the thing that I want to highlight is that in this case,
when one is doing Coecke semantics with CCG parsing, the structure of
the lexicon is changed. One retains the words, and their associated
syntactic category. But now, instead of the lambda terms (with their
corresponding interpretation as actual relations/functions), one has
vectors and tensors for simple and compound syntactic categories (say
N vs N/N) respectively. When those tensors/vectors are of booleans one
recovers Montague semantics.

In the Coecke general case, sentences mean vectors in a real vector
space and the benefits start by using its inner product, and hence
norm and metric, so you can measure quantitatively sentence similarity
(rather normalized vectors...).

CCG is very nice in practical terms. An open SOTA parser
implementation is [3] described in [4], to be compared with [5] ("The
parser finds the optimal parse for 99.9% of held-out sentences").
openCCG is older but does parsing and generation.

One thing that I don't understand well with the above stuff is that
the category of vector spaces over a fixed field (or even the finite
dimensional ones) is *not* cartesian closed. While in the presentation
of Montague semantics in NLTK book ch. 10 the lambda calculus appears
to be untyped, more faithful presentations seem to require (simply)
typed or even a more complex calculus/logic. In that case the semantic
category perhaps should had to be cartesian closed, supporting in
particular higher order maps.

That's all in the expository front and now some speculation.

Up to now the only tangible enhancement brought by Coecke semantics is
the motivation of a metric among sentence meanings. What we really
want is a mathematical motivation to probabilize the crisp, hard facts
character of the interpretation of sentences as Montague lambda terms.
How to attack the problem?

One idea is to experiment with other kinds of semantic category as
target of the Coecke semantic functor. To be terse, this can be
explored by means of a monad on a vanilla unstructured base category
such as finite sets. One can have several choices of endofunctor to
specify the corresponding monad. Then the semantic category proposed
is its Kleisli category. Theese categories are monoidal and have a
revealing diagrammatic notation.

1.- Powerset endofunctor. This gives rise to the category of sets,
relations and cartesian product as monoidal operation. Coecke
semantincs results in montagovian hard facts as described above.
Coecke and Kissinger's new book [6] details the diagramatic language
particulars.
2.- Vector space monad (over the reals). Since the sets are finite,
the Kleisli category is that of finite dimensional real vector spaces.
That is properly Coecke's framework for computing sentence similarity.
Circuit diagrams are tensor networks where boxes are tensors and wires
are contractions of specific indices.
3.- A monad in quantum computing is shown in [7], and quantumly
motivated semantics is specifically addressed by Coecke. The whole
book [8] discuss the connection though I haven't read it. Circuit
diagrams should be quantum circuits representing possibly unitary
process. Quantum amplitudes through measurement give rise to classical
probabilities.
4.- The Giry monad here results from the functor that produces all
formal convex linear combinations of the elements of a given set. The
Kleisli category is very interesting, having as maps probabilistic
mappings that under the hood are just conditional probabilities. This
maps allow a more user friendly understanding of Markov Chains, Markov
Decission Processes, HMMs, POMDPs, Naive Bayes classifiers and Kalman
filters. Circuit diagrams have to correspond to the factor diagrams
notation of bayesian networks [9], and the law of total probability
generalizes in bayesian networks to the linear algebra tensor network
calculations of the corresponding network (this can be shown in actual
bayesian network software).

A quote from mathematician Gian Carlo Rota [10]:

"The first lecture by Jack [Schwartz] I listened to was given in the
spring of 1954 in a seminar in functional analysis. A brilliant array
of lecturers had been expounding throughout the spring term on their
pet topics. Jack's lecture dealt with stochastic processes.
Probability was still a mysterious subject cultivated by a few
scattered mathematicians, and the expression "Markov chain" conveyed
more than a hint of mystery. Jack started his lecture with the words,
"A Markov chain is a generalization of a function." His perfect
motivation of the Markov property put the audience at ease. Graduate
students and instructors relaxed and followed his every word to the
end."

The thing I would research would be to use as semantic category that
of those generalized functions of the former quote and bullet 4 so
basically you replace word2vec vectors by probability distributions of
the words meaning something, connect a bayesian network from the CCG
parse and apply generalized total probability to obtain probabilized
booleans, i.e. a number 0 <= x <= 1 (instead of just a boolean as with
Montague semantics). That is, the probability that a sentence holds
depends on the distributions of its syntactically elementary
contituyents meaning something, and those distros are combined by
factors of a bayesian net with conditional independence relations that
respect and reflect the sentence syntax and have the local Markov
property. The factors are for words of complex syntactic cateogory (as
N/N...) and their attached tensors are multivariate conditional
probability distributions.

Hope this helps somehow. Kind regards,
Jesus.


[1] http://yoavartzi.com/pub/afz-tutorial.acl.2013.pdf
[2] http://www.cl.cam.ac.uk/~sc609/pubs/eacl14types.pdf
[3] http://homepages.inf.ed.ac.uk/s1049478/easyccg.html
[4] http://www.aclweb.org/anthology/D14-1107
[5] https://arxiv.org/abs/1607.01432
[6] ISBN 1108107710
[7] https://bram.westerbaan.name/kleisli.pdf
[8] ISBN 9780199646296
[9] http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10799.pdf
[10] Indiscrete thoughts

Ben Goertzel

unread,
Apr 12, 2017, 4:20:55 AM4/12/17
to opencog, Linas Vepstas, Mas Ben
Jesus...

On Sat, Apr 8, 2017 at 2:24 AM, Jesús López
<jesus.lope...@gmail.com> wrote:
> Coecke semantic functor definition, however, hardly needs any
> modification if we use as target the compact closed category of
> modules over a fixed semiring. If the semiring is that of booleans, we
> are talking about the category of relations between sets, with Pierce
> relational product (uncle = brother * father) expressed with the same
> matrix product formula of linear algebra, and with cartesian product
> as the tensor product that makes it monoidal.
>
> The idea is that when Coecke semantic functor has as codomain the
> category of relations, one obtains Montague semantics. More exactly,
> when one applies the semantic functor to a pregroup grammar parse
> structure of a sentence, one obtains the lambda term that Montague
> would have attached to it.

Ah, I see.... That's actually very nice...

The semiring could also be a non-Boolean algebra of relations on
graphs or hypergraphs, I think... like the ones I talk vaguely about
there...

https://arxiv.org/abs/1703.04382

Ben Goertzel

unread,
Apr 12, 2017, 4:47:00 AM4/12/17
to opencog, Linas Vepstas, Mas Ben, 练睿婷, Ralf Mayet, Amen Belayneh
Speculating a little further on this...

In word2vec one trains a neural networks to do the following. Given a
specific word in the middle of a sentence (the input word), one looks
at the words nearby and pick one at random. The network is going to
tell us the probability -- for every word in our vocabulary -- of that
word being the “nearby word” that we chose.

Suppose we try to use word2vec on a vocabulary of 10K words and try to
project the words into vectors of 300 features.

Then the input layer has 10K neurons (one per word), only one of which
is active at a time; the hidden layer has 300 neurons, and the output
layer has 10K neurons... the vector for a word is then given by the
weights to the hidden layer from that word...

(see http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
for simple overview...)

This is cool but not necessarily the best way to do this sort of thing, right?

An alternate approach in the spirit of InfoGAN would be to try to
learn a "generative" network that, given an input word W, outputs the
distribution of words surrounding W .... There would also be an
"adversarial" network that would try to distinguish the distributions
produced by the generative network, from the distribution produced
from the actual word.... The generative network could have some
latent variables that are supposed to be informationally correlated
with the distribution produced...

One would then expect/hope that the latent variables of the generative
model would correspond to relevant linguistic features... so one would
get shorter and more interesting vectors than word2vec gives...

Suppose that in such a network, for "words surrounding W", one used
"words linked to W in a dependency parse".... Then the latent
variables of the generative model mentioned above, should be the
relevant syntactico-semantic aspects of the syntactic relationships
that W displays in the dependency parse....

Clustering on these vectors of latent variables should give very nice
clusters which can then be used to define new variables ("parts of
speech") for the next round of dependency parsing in our language
learning algorithm...

-- Ben


On Sat, Apr 8, 2017 at 2:24 AM, Jesús López
<jesus.lope...@gmail.com> wrote:
> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAFx29Pu6zK7MwbOuTPHcwOUuW9C6Wrhc9ZFxc5Kp3J4GkMtHkg%40mail.gmail.com.

Ben Goertzel

unread,
Apr 12, 2017, 6:50:54 AM4/12/17
to opencog, Linas Vepstas, Mas Ben, 练睿婷, Ralf Mayet, Amen Belayneh
Having thought a little more... I'll need to think more about what's
the right network architecture to handle the inputs for applying the
InfoGAN methodology to this case...

Ben Goertzel

unread,
Apr 13, 2017, 12:19:25 AM4/13/17
to opencog, Linas Vepstas, Mas Ben, 练睿婷, Ralf Mayet, Amen Belayneh
OK, let me try to rephrase this more clearly...

What I am thinking is --

In the GAN, the generative network takes in some random noise
variables, and outputs a distribution over (link type, word) pairs
[or in the plain-vanilla version without dependency parses, it would
merely be over words

The GAN would then be generating "statistical contexts" (corresponding to words)

The adversarial (discriminator) network is trying to tell the real
contexts from the randomly generated fake contexts...

The InfoGAN variation would mean the GAN has some latent noise
variables that indicate key features of real word contexts.....
Presumably these would give a multidimensional parametrization of the
scope of word contexts, and hence the scope of words-in-context (i.e.
word meanings)

So the architecture is nothing like word2vec, but the result is a
vector for each word: the vector being the settings of the latent
variables of the GAN network that generate the context for that
word...

This may still be fuzzy but hopefully is more clearly in a meaningful
direction...

This is "just" to find a maximally nice way to fill in the
clustering-ish step in our unsupervised grammar induction algorithm...

ben

Jesús López

unread,
Apr 22, 2017, 9:53:37 AM4/22/17
to ope...@googlegroups.com, Linas Vepstas, Mas Ben, 练睿婷, Ralf Mayet, Amen Belayneh
Hi again, just wanted to drop a pair of thoughts.

What I'm talking about is more of conceptual exploration, categorical
and liguistically motivated while Ben talk is more neural and
hands-on. What would be nice is connecting the threads.

Previously Ben said:
> The semiring could also be a non-Boolean algebra of relations on
graphs or hypergraphs

That would demand to substitute the numbers in the word2vec vectors
(and Coecke tensors!) by whole relations (relations on hypergraphs are
much fatter than just numbers) which I'm not sure you'd even want. I
didn't remember seeing this before. For good or bad, last week
appeared arxiv:1704.05725 for the categorical quantum mechanics
setting where they seem to be doing just that sort of thing,
substituting the complex numbers field by an arbitrary C*-algebra. If
you can think of your algebra of relations as C-star, that would push
that idea some further, though I don't really know how far it goes
semantically, not to speak about learning parameters. One would need
also the glue to apply the former paper idea to the quantum flavor of
Cocke semantics.

Can't help on GAN stuff because of lacking homework on that. However I
would also look to what Socher did in 2013. Typical neural nets are
many-flat sandwiches of rectangles of weights (linear), that have
stacked on top a vector of nonlinearities and so on. Socher
introduced/used *tensor* neural nets where he used a *cube* for a
*bi*-linear transformation followed by nonlinearity. His units
transform pair of vectors to single vectors and his NN topology is a
binary tree (instead of a linear stacking of layers of a classical
NN). If you have a fragment of English generated by a CFG, the parse
tree (true tree) can be binarized [1], and each node would be a Socher
net unit, with leaves being distributional (word2vec) vectors.

The difference of this with Cocke is that in the later there is not
binarization (instead multilinear, general tensors), and the net is
not a tree but a DAG. And more importantly of course there are
nonlinear extra toppings of nodes in Socher and an actual learning
algorithm, thing left for the future more or less in Coecke view
despite some efforts. So basically if you put a nonlinear topping or
hat on each of the nodes of what I was calling a tensor network you
should arrive at a neural tensor net. Just split the rank r of the
tensor in r = u + v, for u the quantity of contravariant (input)
indices, and v the quantity of covariants (outputs). Then each node
tensor has u *vectors* as inputs (2 in Socher) and v output vectors.
One needs an analogue of the element-wise nonlinearity in this context
but I don't know which. As the topology can include "diamond" paths,
one needs a suited learning method. I've read about what's called
backpropagation through structure in tensor neural net papers.

Another technical difference is that Socher had an extra additive
contribution to the output of their bilinearly-flavored units by an
extra classical NN-stage, just not to lie.

All the former if one has serious interest in the Cocke approach to semantics.

Note that while Coecke theory is very pleasant categorically, the
nonlinear toppings have not received any attention from categorists
that I know of.

On the purely categorical side of understanding this same problem, and
forgetting parameter learning for a moment, I had a litte realization
to share. I talked about categories resulting of several monads as
*targets* of Coecke semantic functor. Later I remembered that the
source has also monad flavour. Sequences of things can be understood
through the list monad from the viewpoint of functional programming,
or the free monoid monad of the purists. One can thus see sentences as
sequences of words (lexical entities) given by a specific monad. Thus
we have monad flavour in both source and target of the semantics
functor. That prompts questions on the character of the functor
itself.

That thoughts put me in the functional programmer mindset and I
remembered an old reading by Wadler, he was talking of understanding
(in functional programming and using Moggi ideas on computing with
monads) recursive descent parsers of domain specific languages given
by a context free grammar by monadic means. The topic is called
monadic parsing. For developers. Interestingly this viewpoint is
permeating into Linguistics as well, as demonstrated by "Monads for
natural language semantics" (Shan). He talks of semantics as a monad
transformer. We are at a point where there even is a section called
"The CCG monad" in book of isbn 9783110251708.

I don't know of work reconciling the monadic viewpoint with Coecke
stuff, but it is intriguing.

Regards, Jesús.


[1] http://images.slideplayer.com/15/4559376/slides/slide_39.jpg
> You received this message because you are subscribed to a topic in the
> Google Groups "opencog" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opencog/mX93L866Z_Q/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CACYTDBcgaVWFzeQ6jVHcpqb2%2BoRJBvs0MOwEMHFKpD3__gSbxA%40mail.gmail.com.

Jesús López

unread,
Apr 22, 2017, 4:48:56 PM4/22/17
to ope...@googlegroups.com, Linas Vepstas, Mas Ben, 练睿婷, Ralf Mayet, Amen Belayneh
correction: swap co/contravariant.
Reply all
Reply to author
Forward
0 new messages