# A srfi for tensors

13 views

### Linas Vepstas

Aug 20, 2020, 7:27:34 PM8/20/20
to srfi...@srfi.schemers.org, John Cowan, Arvydas Silanskas, Marc Nieper-Wi?kirchen, Amirouche Boubekki, Bradley Lucier, link-grammar, opencog
You may feel that I've already written too much about tensors, but I'd like to make one last remark. This is partly for my own benefit, as writing helps clarify my own thoughts. It's partly for you, as I hope to do something constructive, here, for scheme. It may seem off-topic for srfi-194 but does have a certain relevance, in general, for scheme, as I hope to show, with luck, by the end of this email.

(I am cc'ing two other mailing lists, so this is not just about scheme srfi's, but also about tuples and natural language)

Let's start with "the tensor product", and let me start with classical, i.e. non-lisp notation.  We'll get back to lisp notation, which is the point of the email(!?)

Let F(x,y,z) be a "function", returning a real number, for integer-variable arguments x,y,z. Likewise G(u,v).  Pick a specific set of integers a,b,c,d,e. That is, let a,b,c,d,e be constants.

The tensor product of F and G is simply the numerical value (the real-number) that is the ordinary, numeric product of the numerical values of F and G. Trivial enough, eh? The result is a tensor T(x,y,z,u,v) whose numerical value at T(a,b,c,d,e) is just F(a,b,c)G(d,e)  A key property of the tensor product (the whole thing that makes all this non-trivial) is that it "erases" where some "information" "came from from".  That is, there might have been some other functions H(a,b) and K(c,d,e) such that T(a,b,c,d,e) = H(a,b)K(c,d,e)  ... we no longer know "where the indexes come from" (did they come from H and K, or from F and G?) and we don't know where the values came from, either... if T(a,b,c,d,e)=42, we don't know if F(a,b,c)=6 and G(d,e)=7 or if F(a,b,c)=2 and G(d,e)=21 .. you just can't tell any more.

Let's try that again, in lisp notation. Earlier, I mentioned that simply-typed lambda calculus is the internal language of Cartesian products. Perhaps you already know this and it's obvious (it's textbook-101) , but let me review anyway.   The Cartesian product of spaces X,Y,Z is X x Y x Z and if I sample a,b,c from them I have the ordered tuple (a,b,c) or, in lisp notation, the list ('a 'b 'c)  .. let me drop the quotes, and just write (a b c)

The magic of cartesian products is you can nest them: Thus, the Cartesian category consists of all possible strings of letters and spaces and balanced parens: e.g. (a (b c) a (d e) (p (q (r (s))))) where all of these single letters were constants. Now, mathematicians love variables, so we allow variables x,y,z... Mathematicians also love to plug values into variables and by convention, the act of plugging in is called "beta reduction" and the notational device of what-to-plug-where is called "lambda"   and so one may write λxyz. (a (b x) a (p q y)) (d e f) by which we mean "plug in def for xyz" (or "connect" per my earlier email).  All that we've done here is to work with the cartesian product, constants, variables, a notational trick for connecting-up, called lambda, and beta-reduction and bingo .. simply typed lambda calculus.  Textbook chapter-1 stuff. You've already known this since forever. ("The internal language of cartesian-closed categories is simply typed lambda calculus" to put it formally.)

So what's the deal with the tensor product? Associated with the list (a b c) .. that is, the cartesian product of a and b and c, is a number, say "6" and associated with the list (d e) is another number, say "7".  The tensor product is a list concatenation (a b c d e) and the associated value 42. The key point is the "erasure" of parenthesis: We are emphatically NOT writing ((a b c) (d e))  -- list concatenation "erases" the knowledge of the two original lists.   In this sense, the tensor product is "the same thing as" list concatenation. Well, in this example, we also multiplied 6 and 7 to get 42, which is how numerically-valued tensors work (how they must work).  For the generic case, for foo-valued tensors, one only needs to provide a product to multiply two foo's together. (and this product needs to "forget" the initial factors).

Besides products, it is convenient to add (sum together) tensors. For foo-valued tensors, one might not have any valid kind of addition of foo's. One can still go meta and say "this foo or that foo" where by "or" I mean "disjunctive or" -- pick one; menu-choice.  Some people like to use the v symbol for disjunctive-or, and the & symbol for list concatenation, and thus one naturally is led to ask what happens when considering the algebra of these two symbols. They are obviously distributive in one sense, but not the other, so it is NOT a boolean algebra  (confusion about this - confusion with boolean-and and boolean-or has led many a beginner astray).  Throw in variables and lambda and beta reduction and bingo, you've got ... umm, I'm unclear on the formal name. Let's call it "tensor logic". (the internal logic of the closed monoidal category)

Inner product was mentioned in earlier emails. For two numerically-valued vectors A=[a_1, a_2, ... a_n] and B=[b_1, b_2, ..., b_n] it was defined as inner(A,B) = A.B = a_1 . b_1 +  a_2 . b_2 + ... + a_n . b_n and so for foo-valued vectors, it is likewise: a_1 & b_1 v  a_2 &  b_2 v ... v a_n & b_n  where a_1 & b_1 is the product of foo a_1 and foo b_1, etc.  The disjoin is the meta-disjoined, disjunctive-or.

I previously talked about beta-reduction implying that it's the same thing as "connecting" -- that plugging in 42 for x in f(x) gives f(42) by "connecting" 42 to x. But I also said that "contracting together tensor indexes" is "connecting" -- that is, the inner product is "connecting".   So, which one is it? Both.

Type theory to the rescue. When I say "plugging in 42 for x in f(x)" I really mean that "x must be of integer type" and "42 is an exemplar of the integer type" and that "plugging in" is allowed only when the types match. Now, an integer can be 1 v 2 v 3 v ... (disjunctive-or) and thus, the result of "plugging in" is f(1) v f(2) v f(3) v ... well, that's just the inner product. To belabor the point, and get awkward with the notation, I should have written f_1&1  v  f_2&2  v  f_3&3  v .... where of course f_1 & 1 = f(1) is how one takes the "product" of foo f_1 with the foo 1 to get foobar f(1).

How far away is this from a "tensor srfi"? Some ingredient seems missing ...

The difference between simply-typed lambda-calculus and lambda calculus with types is .. well, I guess it's dramatic. The tensor algebra, as canonically defined in mathematics, is simply-typed. That is, all of the tensor indexes are of the same type: for tensor T(x,y,z) the x,y,z are all the same type. This is usually not acceptable for programming, where you might want to say "x is an int and y is a string and z is a float" (somehow, lisp/scheme programmers are happy without explicit types...). Even in physics, tensors get typed: the quark wave-function (a complex-number-valued function) has three tensor indexes: a dirac-spinor index, a color index, a flavor index; thou shalt not mix these up; typing is respected.

So this is where I have trouble with srfi-179: the indexes were always integers (ordinals). So, sure, for a quark wave-function, the dirac-spinor index ranges from 1,..4 and the color index ranges 1..3 and the flavor index ranges from 1..6 but these are really different types. It is kind of "an accident" that ordinal numbers  can be used. Physicists don't, they say "u,d,s,c,b,t" for flavor, and "r,g,b" for color. Historically, the dirac-spinor-index is 1..4 but sometimes also  (1..2) \oplus (1..2)

So, I'm thinking more like a mash-up of srfi-111 (boxes) with Amirouche's srfi-168 (tuple-store database) ... so, to represent a quark, I'm thinking I want to define a box to hold a complex number, and I want to index the box with a triple of (spin, color, flavor)  .. but I still need "connectors" -- the QCD axial-vector current is psi-bar gamma_5 gamma_\mu psi and thus I have to contract/connect the color and flavor and spin indexes correctly.  This is the inner product -- I need the inner product. The srfi-168 doesn't tell me how to write inner products of tuples. The srfi-179 almost seems to but not quite, from what I can tell.

OK, in earlier emails, I alluded to natural language, suggesting that lexical grammars are really just tensors. This is worth articulating, since I feel a proper tensor srfi needs to be able to handle this case.

I am going to use link-grammar https://www.abisource.com/projects/link-grammar/ as my prototypical example. It defines a lexical grammar for English, Russian and several other languages (Arabic, Persian..) The lexical entries are all tensors. Let me give an example. You might open the 4.0.dict file and find these entries (highly simplified):

John Mary: S+ or O-;
told: S- & O+;

The S+ and O- are "connectors" and the grammar rules is that types must match, polarities must be opposite (contra/covariant, upper/lower, if you wish to go that way... I don't, but it's possible).  The "or" is the disjunctive-or, from above.  The & is the tensor-product, from above.  The only valid sentences of the language are "John told Mary" or "Mary told John" or "John told John" or "Mary told Mary" .  S and O are linguistic short-hand for "subject" and "object".

Mary and John are both vectors, or 1-tensors. S corresponds to the vector [1,0] and O corresponds to the vector [0,1]  ... to be more precise, John is "either the vector [0,1] or the vector [1,0] but not both" -- it is disjoined; the menu-choices are called "disjuncts".

"told" is a verb, represented as matrix, or 2-tensor. It has the shape  [[0,1],[1,0]]  or, in 2-D

0 1
1 0

The sentence fragment "John told" is the vector [0,1] which we recognize as being of type O  - the object - this sentence is missing the object. If we contract with another object [0,1] we get a scalar 1 and thus a valid sentence.  If we contract with another subject S [1,0] we get a scalar zero - an invalid sentence.  This example is perhaps confusing, as John and Mary can be either subjects or objects, I guess I should have used "rock" and "tree" instead, and "saw", and a grammar that allows "John saw the tree" but not other combinations.  This email is long, and writing out the combinations is tedious, so I leave this as an "exercise for the reader"?

Now, these vectors and tensors can be very sparse -- besides S and O, there is A for adjective and D for determiner and P for preposition, and many more. Writing these as long vectors of 0's and 1's is tedious, thus the letters.  Also, these are usually very sparse - mostly all zeros.  There is a bit of trickery where I can replace the disjunctive-or by plain-old vector addition, and the number 1 by the probability so that these look like probability vectors and markov matrices, but I wander off-topic, so let's not do that (yet).

Type-theoretically, S and O are "primitive types", and what is meant by "being able to connect" is that the types must match, which means the same thing as saying that the inner product must be non-zero. Connecting together two link-grammar connectors is the same thing as taking the inner product.

The tensor product S-&O+ is a compound type constructed with the compound-type constructor; formally, this is called "the product type constructor".  It is analogous to a java/c++ "class" -- thus, formally, "classes" in java/c++ are really just tensors in disguise; they are the tensor product of their members.  A class with 2 members is a 2-tensor; a class with 3 members is a 3-tensor, etc. Recall the tensor product was just just list-concatenation. Scheme has run-time list concatenation, java and C++ do not.  That is, I cannot declare a new c++/java class at run-time. (However, javascript... hmmm... well I wander astray again... there is a reason why javascript is popular, and this is part of it.)

So, in my imagined srfi, I take Bradley's srfi-179 "intervals", and replace a single interval by a set of possible values (a disjoined set -pick any value but only pick one) but since I now work with sets, most of the mechanics of intervals is useless; I don't need ordinals, I need the things that the ordinals are labelling.  Thus I really need tuples (and it should now be clear that tuples are tensors).

I need the tensor product, but that is just tuple concatenation. (which I don't see in srfi-168 but maybe it is hiding in there).

I need the inner product, in several forms: as a type-matching constrainer, which is much faster than computing the inner product of [1,0,0,0...] and [0,0,1,0,...] only to discover that its zero. But also a traditional inner product, so that my types can be weighted by probabilities .. because that is actually very useful...cough neural net cough...

I need the tuple database, so that I can store a lexis -- a large collection of tensors (i.e. elements of tensors).  The tuple database is highly advantageous for sparse tensors, where almost all entries are zero.

I'm hoping this is not too much to ask for: tuple concatenation, a tuple store, an inner product. That's it.  How hard can that be?

This email is too long already, but it does need a foot-note: my personal, primary usage for this beast is not to store and compute quark axial-vector currents, (or random matrixes) but to generate "sentences" i.e. all the non-zero scalars contracted from tensors aka the "grammatically valid sentences", or, alternately, given a tuple-store (a lexis), data-mine it to discover how the indexes connected up (aka "parse a sentence").  So the programming API for this imagined srfi would need to be amenable to such usage.

Three things: tuple concatenation, a tuple store, an inner product. How hard can that be?

-- linas