A question about tensor product

YKY (Yan King Yin, 甄景贤)

unread,

Jun 18, 2014, 8:47:50 AM6/18/14

to AGI mailing list, general-in...@googlegroups.com

Words or concepts can be extracted as vectors using Google's word2vec algorithm:

https://code.google.com/p/word2vec/

To express a complex thought composed of simpler concepts, a mathematically natural way is to multiply them together, for example "John loves Mary" = john x loves x mary.

I'm wondering if forming the tensor products from word2vec vectors could be meaningful.

The tensor product is a bi-linear form (the most universal such bi-linear mappings). So it may preserve the linearity of the original vector space (in other words, the scalar multiplication in the original vector space). If the scalar multiplication is meaningful in the word2vec space, then its meaning would be preserved by the tensor product.

The dimension of the tensor product space is also much higher (as the product of the dimensions of the original spaces; this is even greater than the Cartesian product which is the sum of the dimensions of the original spaces.) Computationally, I wonder what is the advantage of using tensor products as opposed to Cartesian products...?

Or perhaps the extra richness of tensor structure can be exploited differently...

--

YKY

"The ultimate goal of mathematics is to eliminate any need for intelligent thought" -- Alfred North Whitehead

William Taysom

unread,

Jun 18, 2014, 9:05:58 AM6/18/14

to general-in...@googlegroups.com, AGI mailing list

YKY, you may some valuable insight in Michael Leyton's "A Generative Theory of Shape" <http://www.springer.com/computer/image+processing/book/978-3-540-42717-9>.

> --
> You received this message because you are subscribed to the Google Groups "Genifer" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to general-intellig...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Matt Mahoney

unread,

Jun 18, 2014, 10:23:10 AM6/18/14

to general-intelligence, AGI mailing list

The semantic vector of a sentence is approximately the sum of the word
vectors, not the product. It is not exact because it does not account
for word order. John + loves + Mary = Mary + loves + John.

> --
> You received this message because you are subscribed to the Google Groups
> "Genifer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to general-intellig...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
-- Matt Mahoney, mattma...@gmail.com

Linas Vepstas

unread,

Jun 18, 2014, 5:29:21 PM6/18/14

to general-in...@googlegroups.com, AGI mailing list

Semantic vectors sort-of-ish work because the mathematical structure of the tensor product, and the structure of grammar are both described by the same underlying device: the so-called "non-symmetric compact closed monoidal category". The difference is that tensors are also symmetric, and so forcing this symmetry then forces a kind-of straight-jacket onto the language.

References:

http://en.wikipedia.org/wiki/Pregroup_grammar

YKY (Yan King Yin, 甄景贤)

unread,

Jun 19, 2014, 4:41:08 PM6/19/14

to general-in...@googlegroups.com

On Wed, Jun 18, 2014 at 9:07 PM, William Taysom <wta...@gmail.com> wrote:

YKY, you may some valuable insight in Michael Leyton's "A Generative Theory of Shape" <http://www.springer.com/computer/image+processing/book/978-3-540-42717-9>.

Interesting book, it will take me a while to understand =)

Seems very group-theoretic and geometric. Not sure if the approach can generalize to general intelligence or logic, the latter is not necessarily spatial or geometric.

YKY (Yan King Yin, 甄景贤)

unread,

Jun 19, 2014, 5:35:19 PM6/19/14

to general-in...@googlegroups.com

On Thu, Jun 19, 2014 at 5:29 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Semantic vectors sort-of-ish work because the mathematical structure of the tensor product, and the structure of grammar are both described by the same underlying device: the so-called "non-symmetric compact closed monoidal category". The difference is that tensors are also symmetric, and so forcing this symmetry then forces a kind-of straight-jacket onto the language.

According to the Wikipedia page, symmetric means that the dual is replaced by left and right adjoints. I can only vaguely understand this...

Since the tensor product is non-commutative, I was thinking it suffices to represent word-order differences, such as

John loves Mary != Mary loves John.

Perhaps the non-symmetric category is a better model for sentences with word order differences, than merely using a (tensor) product that is non-commutative?

References:

http://en.wikipedia.org/wiki/Pregroup_grammar
see also work by Bob Coecke

FWIW, I believe that dependency grammars, and link-grammar in particular, are isomorphic to categorical grammars. Its almost obvious if you stare at the above wikipedia article long enough: the expressions are just link-grammar links. The categorical grammar notation is rather unwieldy, that's the big difference.

I don't see a clear correspondence between link grammar and Lambek (categorical grammar). What do the "links" correspond to, in the categorical formation?

By the way, the book "Quantum physics and linguistics" seems very good for this type of things:
http://ukcatalogue.oup.com/product/9780199646296.do

Linas Vepstas

unread,

Jun 19, 2014, 6:08:56 PM6/19/14

to general-in...@googlegroups.com

On 19 June 2014 16:35, YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com> wrote:

On Thu, Jun 19, 2014 at 5:29 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Semantic vectors sort-of-ish work because the mathematical structure of the tensor product, and the structure of grammar are both described by the same underlying device: the so-called "non-symmetric compact closed monoidal category". The difference is that tensors are also symmetric, and so forcing this symmetry then forces a kind-of straight-jacket onto the language.

According to the Wikipedia page, symmetric means that the dual is replaced by left and right adjoints. I can only vaguely understand this...

Since the tensor product is non-commutative, I was thinking it suffices to represent word-order differences, such as

John loves Mary != Mary loves John.

Perhaps the non-symmetric category is a better model for sentences with word order differences, than merely using a (tensor) product that is non-commutative?

yes exactly.

References:

http://en.wikipedia.org/wiki/Pregroup_grammar
see also work by Bob Coecke

FWIW, I believe that dependency grammars, and link-grammar in particular, are isomorphic to categorical grammars. Its almost obvious if you stare at the above wikipedia article long enough: the expressions are just link-grammar links. The categorical grammar notation is rather unwieldy, that's the big difference.

I don't see a clear correspondence between link grammar and Lambek (categorical grammar). What do the "links" correspond to, in the categorical formation?

e.g. from https://en.wikipedia.org/wiki/Pregroup_grammar

half-way down the page: "John met Mary" the verb met is N^r . S . N^l where S (sentence) says its the root verb, N^r says must link to noun on the right, N^l says must link to noun on left.

In link grammar we use +, - instead of l, r, and divide nouns into subjects and objects. So above verb would be as subject-root-object, or S- & WV- & O+ which says WV- is the head-word-to-wall connects, conventionally on the left. WV- is the same as the .S. in pregroup. The S- (subject) says there is a noun that is the subject on the left. The O+ (object) says there is a noun that is object on the right.

At this level, it should be clear that its a notational difference. The lines drawn between the words in the wikipedia page are essentially identical to link-grammar links,

By the way, the book "Quantum physics and linguistics" seems very good for this type of things:
http://ukcatalogue.oup.com/product/9780199646296.do

Yes, that's exactly the right one. Some of the hype is true, see e.g. John Baez "a rosetta stone" so that if you understand the diagrams, some of the mysteries of quantum computation fall away and become revealed.

Some things are overblown, over-hyped. It is true that in quantum, the many-worlds are mutually exclusive. It is true that in linguistics, different parses, different meanings are mutually exclusive. It goes a bit deeper than that, but that's about the extent of it. The winner and looser of a baseball game is mutually exclusive. In a courtroom trial, conviction or vindication of the accused is mutually exclusive. This does not really mean that courtroom trials are like quantum mechanics.

What it does mean is that the concept of mutual exclusion is deep, difficult and pervasive, and is worth studying in detail. And that is what category theory, lambek calculus does. practical applications happen to include linguistics and quantum ...

--linas

YKY (Yan King Yin, 甄景贤)

unread,

Jun 19, 2014, 10:34:17 PM6/19/14

to general-in...@googlegroups.com

On Fri, Jun 20, 2014 at 6:08 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Perhaps the non-symmetric category is a better model for sentences with word order differences, than merely using a (tensor) product that is non-commutative?

yes exactly.

But what exactly is the advantage of the categorical approach regarding capturing word order, over a non-commutative (say tensor) product?

The way I see it, the unrestricted tensor product has a problem: there is no guarantee that the location of "John loves Mary" wouldn't clash with another unrelated product such as "Peter ate pizza", simply by accident. That is clearly bad as a model of semantic space.

The remedy is to enforce a distance metric among word products. This metric can be defined as the graph distance between nodes in the Cayley graph of the free group generated by the words. This metric makes semantic sense. Any other embedding that do not severely distort this metric would also be OK.

I don't know how the categorical approach deals with this problem. Perhaps it doesn't, because the problem has to do with metric space. Perhaps the categorical approach makes different sentences "mutually exclusive", as you mentioned?

Linas Vepstas

unread,

Jun 20, 2014, 12:10:36 AM6/20/14

to general-in...@googlegroups.com

On 19 June 2014 21:34, YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com> wrote:

On Fri, Jun 20, 2014 at 6:08 AM, Linas Vepstas <linasv...@gmail.com> wrote:

Perhaps the non-symmetric category is a better model for sentences with word order differences, than merely using a (tensor) product that is non-commutative?

yes exactly.

But what exactly is the advantage of the categorical approach regarding capturing word order, over a non-commutative (say tensor) product?

Who said commutative? Associativity is where its at.

Category theory is nothing more than some very abstract machinery for saying obvious things you already know. It is useful, because it has the power to expose things you didn't know, or maybe just suspected, and make them obvious. Maybe. Only if you ask the right questions. And invest a lot of time to learn it. So, on the small scale, it offers no advantage at all over traditional linguistics. For example, the paper Ben cites:

http://www.cl.cam.ac.uk/~sc609/pubs/eacl14types.pdf

page 5: pat kisses sandy the verb kisses is (S\NP)/NP

Notice the parenthesis make it (non-)associative - you can't rearrange the parens at will.

also page 5, section 3.1 application. This section spells out *exactly* the link-grammar linkage rules. viz: /NP means link to noun on the right aka O+ while \NP means link to noun on left, aka subject S-, while the sentence S in the middle is my WV link. Hooray, we once again rediscover an old theory of syntax!

The awkwardness of section 3.2 and 3.3 is a notational issue, its where link-grammar just has a cleaner, easier-to-understand notation. You know which indexes can contract, cause the are identified by link-type letters in LG. You also know which direction they can contract: the + and - take care of that. And finally, you know which ones can commute and which ones can't, which is very hard to see in the tensor notation.

(so link-grammar is a semi-commuative monoid, just like a history monoid or a trace monoid)

Anyway, those sections are saying that when you have multiple disjuncts and contract some of the links, you get some other disjunct with possibly uncontracted connectors left over. Just like for tensors.

I can't quite figure out what the LG equivalent of section 3.4 type raising is saying. I'm skimming this now ... I can't figure out if its important to figure out or not.

And then the paper ends. All that it said was "gee golly willickers, tensor contraction mechanically works in *exactly* the same way that dependency grammar parsing" to which a suitable reply is "well, golleee, Mr Douglas, I did not know that.". So, once again, category theory has made transparent something that you did not know before. Now that you know this, and have seen it in action, what do you do with it? There's a big "so what" that is hanging in the air ... what are we all rubes here in Green Acres?

The way I see it, the unrestricted tensor product has a problem: there is no guarantee that the location of "John loves Mary" wouldn't clash with another unrelated product such as "Peter ate pizza", simply by accident. That is clearly bad as a model of semantic space.

The problem arises when you take the analogy too far. Notice that nowhere in the eacl14 paper, did it say that words are real number-valued tensors, or that you should substitute real numbers for words. or that you should actually do anything with real-valued tensors. It simply pointed out that the *mechanics* of tensor contraction is just like parsing. It does not say "you should use real-number-valued tensors for words". That leap is where the analogy breaks down, where the problems start to arise. People have taken that leap (I guess that's what started this thread), and it seems to sort-of-ish kind-of work, but that's probably by accident, as a side-effect, only because the mechanics for the two work the same way. It doesn't seem rooted in basic semantics. Unless I'm missing something.

I guess I should give it all a second look. where "it" is the original work I guess Baroni, Coecke, on cited in the intro to the eacl paper.

The remedy is to enforce a distance metric among word products. This metric can be defined as the graph distance between nodes in the Cayley graph of the free group generated by the words. This metric makes semantic sense. Any other embedding that do not severely distort this metric would also be OK.

I don't know how the categorical approach deals with this problem. Perhaps it doesn't, because the problem has to do with metric space. Perhaps the categorical approach makes different sentences "mutually exclusive", as you mentioned?

No, only that different disjuncts are mutually exclusive. If some noun is the subject of a sentence it cannot also be the object.

--linas

Reply all

Reply to author

Forward