Fwd: [New post] Everything is a Network

80 views
Skip to first unread message

Linas Vepstas

unread,
Jun 10, 2021, 8:33:13 PM6/10/21
to opencog, link-grammar
I just wrote up a new blog post on ... well, the usual topic. I'm cc'ing the Link Grammar mailing list, as it has been instrumental in waking me to these ideas.

-- Linas

---------- Forwarded message ---------
From: OpenCog Brainwave <donot...@wordpress.com>
Date: Thu, Jun 10, 2021 at 6:55 PM
Subject: [New post] Everything is a Network
To: <linasv...@gmail.com>


Linas Vepstas posted: "The goal of AGI is to create a thinking machine, a thinking organism, an algorithmic means of knowledge representation, knowledge discovery and self-expression. There are two conventional approaches to this endeavor. One is the ad hoc assembly of assorted"

New post on OpenCog Brainwave

Everything is a Network

by Linas Vepstas

The goal of AGI is to create a thinking machine, a thinking organism, an algorithmic means of knowledge representation, knowledge discovery and self-expression. There are two conventional approaches to this endeavor. One is the ad hoc assembly of assorted technology pieces-parts, with the implicit belief that, after some clever software engineering, it will just come alive. The other approach is to propose some grand over-arching theory-of-everything that, once implemented in software, will just come alive and become the Singularity.

This blog post is a sketch of the second case. As you read what follows, your eyes might glaze over, and you might think to yourself, "oh this is silly, why am I wasting my time reading this?" The reason for this is that, to say what I need to say, I must necessarily talk in such generalities, and provide such silly, childish examples, that it all seems a bit vapid. The problem is that a theory of everything must necessarily talk about everything, which is hard to do without saying things that seem obvious. Do not be fooled. What follows is backed up by some deep and very abstract mathematics that few have access to. I'll try to summon a basic bibliography at the end, but, for most readers who have not been studying the mathematics of knowledge for the last few decades, the learning curve will be impossibly steep. This is an expedition to the Everest of intellectual pursuits. You can come at this from any (intellectual) race, creed or color; but the formalities may likely exhaust you. That's OK. If you have 5 or 10 or 20 years, you can train and work out and lift weights. You can get there. And so... on with the show.

The core premise is that "everything is a network" -- By "network", I mean a graph, possibly with directed edges, usually with typed edges, usually with weights, numbers, and other data on each vertex or edge. By "everything" I mean "everything". Knowledge, language, vision, understanding, facts, deduction, reasoning, algorithms, ideas, beliefs ... biological molecules... everything.

A key real-life "fact" about the "graph of everything" is it consists almost entirely of repeating sub-patterns. For example, "the thigh bone is connected to the hip bone" -- this is true generically for vertebrates, no matter which animal it might be, or if it's alive or dead, it's imaginary or real. The patterns may be trite, or they may be complex. For images/vision, an example might be "select all photos containing a car" -- superficially, this requires knowing how cars look alike, and what part of the pattern is important (wheels, windshields) and what is not (color, parked in a lot or flying through space).

The key learning task is to find such recurring patterns, both in fresh sensory input (what "the computer" is seeing/hearing/reading right now) and in stored knowledge (when processing a dataset - previously-learned, remembered knowledge - for example, a dataset of medical symptoms). The task is not just "pattern recognition" identifying a photo of a car, but of pattern discovery -- learning that there are things in the universe called "cars", and that they have wheels and windows -- extensive and intensive properties.

Learning does not mean "training" -- of course, one can train, but AGI cannot depend on some pre-existing dataset, gathered by humans, annotated by humans. Learning really means that, starting from nothing at all, except one's memories, one's sensory inputs, and one's wits and cleverness, one discovers something new, and remembers it.

OK, fine, the above is obvious to all. The novelty begins here: The best way to represent a graph with recurring elements in it is with "jigsaw puzzle pieces". (and NOT with vertexes and edges!!) The pieces represent the recurring elements, and the "connectors" on the piece indicate how the pieces are allowed to join together. For example, the legbone has a jigsaw-puzzle-piece connector on it that says it can only attach to a hipbone. This is true not only metaphorically, but (oddly enough) literally! So when I say "everything is a network" and "the network is a composition of jigsaw puzzle pieces", the deduction is "everything can be described with these (abstract) jigsaw pieces."

That this is the case in linguistics has been repeatedly rediscovered by more than a few linguists. It is explained perhaps the most clearly and directly in the original Link Grammar papers, although I can point at some other writings as well; one from a "classical" (non-mathematical) humanities-department linguist; another from a hard-core mathematician - a category theorist - who rediscovered this from thin air. Once you know what to look for, its freakin everywhere.  Say, in biology, the Krebs cycle (citric acid cycle) - some sugar molecules come in, some ATP goes out, and these chemicals relate to each other not only abstractly as jigsaw-pieces, but also literally, in that they must have the right shapes! The carbon atom itself is of this very form: it can connect, by bonds, in very specific ways. Those bonds, or rather, the possibility of those bonds, can be imagined as the connecting tabs on jigsaw-puzzle pieces.  This is not just a metaphor, it can also be stated in a very precise mathematical sense. (My lament: the mathematical abstraction to make this precise puts it out of reach of most.)

The key learning task is now transformed into one of discerning the shapes of these pieces, given a mixture of "what is known already" plus "sensory data". The scientific endeavor is then: "How to do this?" and "How to do this quickly, efficiently, effectively?" and "How does this relate to other theories, e.g. neural networks?" I believe the answer to the last question is "yes, its related", and I can kind-of explain how. The answer to the first question is "I have a provisional way of doing this, and it seems to work". The middle question - efficiency? Ooooof. This part is ... unknown.

There is an adjoint task to learning, and that is expressing and communicating. Given some knowledge, represented in terms of such jigsaw pieces, how can it be converted from its abstract form (sitting in RAM, on the computer disk), into communications: a sequence of words, sentences, or a drawing, painting?

That's it. That's the meta-background. At this point, I imagine that you, dear reader, probably feel no wiser than you did before you started reading. So what can I say to impart actual wisdom? Well, lets try an argument from authority: a jigsaw-puzzle piece is an object in an (asymmetric) monoidal category. The internal language of that category is ... a language ... a formal language having a syntax. Did that make an impression? Obviously, languages (the set of all syntactically valid expressions) and model-theoretic theories are dual to one-another (this is obvious only if you know model theory). The learning task is to discover the structure, the collection of types, given the language.  There is a wide abundance of machine-learning software that can do this in narrow, specific domains. There is no machine learning software that can do this in the fully generic, fully abstract setting of ... jigsaw puzzle pieces.

Don't laugh. Reread this blog post from the beginning, and everywhere that you see "jigsaw piece", think "syntactic, lexical element of a monoidal category", and everywhere you see "network of everything", think "model theoretic language".  Chew on this for a while, and now think: "Is this doable? Can this be encoded as software? Is it worthwhile? Might this actually work?". I hope that you will see the answer to all of these questions is yes.

And now for a promised bibliography. The topic both deep and broad. There's a lot to comprehend, a lot to master, a lot to do. And, ah, I'm exhausted from writing this; you might be exhausted from reading.  A provisional bibliography can be obtained from two papers I wrote on this topic:

The first paper is rather informal. The second invoked a bunch of math. Both have bibliographies. There are additional PDF's in each of the directories that fill in more details.

This is the level I am currently trying to work at. I invite all interested parties to come have a science party, and play around and see how far this stuff can be made to go.

Linas Vepstas | June 10, 2021 at 11:55 pm | Categories: Uncategorized | URL: https://wp.me/p9hhnI-cl

Comment    See all comments

Unsubscribe to no longer receive posts from OpenCog Brainwave.
Change your email settings at Manage Subscriptions.

Trouble clicking? Copy and paste this URL into your browser:
https://blog.opencog.org/2021/06/10/everything-is-a-network/




--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

Tofara Moyo

unread,
Jun 11, 2021, 3:24:10 AM6/11/21
to ope...@googlegroups.com, link-grammar
This is interesting. I came across similar ideas too and i posted them on the AGI facebook page last year june. here they are for comparison.


this post is about the way things repeat and change in the world. in short something repeats for a while , such as you passing houses while you are walking down a road. So you pass house after house...then you get to an intersection and there are no more houses, but after that you then find that the thing that repeats is "passing houses AND intersections"...so you group the houses with the intersection and you pass this new grouping many times before you get to a mall, then you group all three things together and you keep walking past this new grouping untill you get outside the city, then you group the city and the country side and you start passing many cities and country sides as you go, then this becomes counries and continents and planets and solar systems and galaxies...in short this process describes reality, from the way a piece of wood bark is rough to the way we even think

in mathematics there is a topic called fractals that describes shapes that look the same at different scales,here is a fractal shape that looks the same even when you zoom in. So as you walk past houses , think of that as zooming in to different scales and finding the same object you started with, house after house represents scale after scale. this is more complicated however because after the first set of scales we change focus, and then zoom in on this new grouping/focus as if that was the fractal....

There are other type of fractal like shapes or at least objects that follow the principle that are more applicable to this. Called tilings. These are tiles or identical shapes that are placed side by side and fill a space with no gaps in between them. So the steps you take while walking would each be a tile , while when you stop that becomes a tile of a different shape from the stepping tiles that you join to them...then this new grouping of tiles becomes the shape that you are tiling, when this changes you tile the combination of the change with the original tiles. this is a multi shape tiling that is binary in nature. Even the stepping tiling can be broken into two different tiles, one for each leg...and so on.

A meriology is something that is made of parts. A chair is made of parts that are made of parts all the way down to atoms and even further. the parts of a chair are separated by space and time. The parts of the tiling above may be seperated by space and time such as walking or the texture of a surface, but it can also be seperated by something even stranger. think of the tiling where you are left handed while everyone else is right handed. What separtes the lefties as a group from the righties. it cant be the normal space or time as they are not litteraly seperated by a demarcation placed somewhere. If we could specify a type of space that these two tiles are filling wouldnt that simply be a conceptual space? and if we were to tile a space with concepts would that not be thinking? So we already have a way to use this in AI.



--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA34gQSDZnbkV1Mp7aEOSGoENOkQTAZoBBRVL6%3DwJWfJv%3DA%40mail.gmail.com.

Tofara Moyo

unread,
Jun 11, 2021, 8:18:40 AM6/11/21
to ope...@googlegroups.com, link-grammar
I have a blog post that links this to the axiom of identity. A=A&!A!=A.

Linas Vepstas

unread,
Jun 11, 2021, 4:01:31 PM6/11/21
to opencog, link-grammar
Similar ideas have been circulating for decades or longer. Yes, the concept of fractals and tilings are similar. My goal here is to point out that these ideas can be implemented in software. I'm trying to drum up the practical conversation, the one of "how can we do this?" .

Linas Vepstas

unread,
Jun 11, 2021, 4:18:44 PM6/11/21
to opencog, link-grammar
Moyo,

My reply was perhaps too short. I can give very precise and exact descriptions of how all this relates to fractals and tilings ... If you  prod me, I can supply details.  A very important, key bridge  to this is understanding is the work of  Przemyslaw Prusinkiewicz at Algorithmic Botany -- http://algorithmicbotany.org/papers/ -- and the easiest way to understand that is to go in historical order, reading the earliest papers first. I believe the work there is  nothing short of mind-blowing, and stunningly important, yet is curiously very under-appreciated, for some unclear reason.

--linas

Tofara Moyo

unread,
Jun 11, 2021, 4:36:06 PM6/11/21
to ope...@googlegroups.com, link-grammar
Yes you're right, implementation is key. I am not all that well versed in mathematics but I believe your approach may yield better results than a fractal based approach, though fractals is a mathematical topic itself. I had the idea that if we placed a normal fractal in a fractal "space" then all the repeating information coincides and we are left with just the one shape. as we morph the space back to 2d all the redundant information resurfaces. This would lead us to believe that in order to train an agent to recognise the information found in reality we need to embed an axiomatic shape into a 2d space, and then alter the dimensionality of that space until it introduces redundant information and produces the fractal that reality is composed of. I mentioned that the fractal is based on the axiom of identity. What this means is that a series of similar tiles persists until it is "time" for an orthogonal tile to be present. Then that new grouping of the orthogonal tile and the originals persists till an orthogonal tiling to those is met and so on. Consider the act of eating for example. Each time you chew you are adding a chewing tile then when you swallow that in some sense becomes an orthogonal tile.Then this tile repeats tile it finds  an orthoganal one to it In fact it can be said that only things that are orthogonal in reality can stick out of nothingness in order to participate in causality.

Also we need to understand that concepts are a plagiarism of reality. When you say "biden bridges the gap between the old and the young" the "bridges" fits naturally in that sentence with all the other concepts because natural bridges have exactly the same shape tile. In fact if you examine that sentence a lot of it makes reference to things in objective reality, Such as gap and between. Even old and young reference space. All thinking is fitting together tiles from reality. Higher thinking involves more sophisticated shape filling by better search methods for those shapes. If we were to create a fractal space where a primitive shape in 2d space describes realitys fractal in that fractal space all we would need to do to act and think would be to move in orthogonal steps from a starting point. All physical movement and conceptualisation would be taken care of at once.

Note also that in a conversation the most informative reply is one that is in some sense orthogonal to the utterance. this way it is more decoupled from the utterance. In a game like Go , performing moves in orthogonal steps along the graph of the game describes optimal play. This takes me to music theory. A scale of 12 notes defines the most ( pleasing ) orthogonal moves as moving in 5ths aka the circle of 5ths. This is related to the fact that a scale would have the most pleasing harmonies and progressions be defined as those that involve ratios with the lowest denominator possible.i believe that the music scale is the most degenerate application of this. if we had a scale with 30 notes orthogonality might have been defined by a circle of 10ths or some other number so 5 is not particularly special.

if we have a graph things get even more interesting. rather than having a single number define orthogonality we now have a vector. knowing that vector is the best way to move along the graph. for instance with go. If we were to reduce the game to a graph and find its vector of orthogonality we would have greater search capabilities in order to perform moves. simply go orthogonal. or if the last n moves have been coupled then the next n moves should be orthogonal to those n moves but coupled among themselves for example.

So if we could start off with a primitive shape in 2d and morph it as we change the space it is in to non interger dimensions we discover the applicable vector that defines orthogonality and move in orthogonal stepwise motion along it.

Tofara Moyo

unread,
Jun 11, 2021, 5:01:55 PM6/11/21
to ope...@googlegroups.com, link-grammar
Think of a song in music . that will illustrate this method of orthogonality better. In music if a musical instrument plays with fast changes for a bit the next thing it will do is go orthogonal and perform slow moves. This happens with all instruments. but at the same time, if the base is moving fast the higher instruments must be in their slower phase for it to sound pleasing. if the verse has low energy the chorus must be orthogonal and have higher energy. If the verse and chorus are too coupled then a bridge will add the needed orthoganaility. in drums if the kick is slow the hats must be faster, every now and then there must be a drum roll to go orthogonal. if a song has alot of low frequency sounds there must be orthogonality produced by having some higher frequency sounds as well. A well made song is one that uses orthogonality to its fullest potential. this is true of science and philosophy. scientific movements are defined by being orthogonal to each other. a well made proof makes statements in orthogonal moves.  Intelligence can be optimised by choosing optimal configurations. If we started off with the axioms of set theory we could recreate all of modern mathematics and more by moving in orthoganal steps that involve combining infomrtion from these axioms.

Now music theory doesn't just have the chromatic scale. It has majors and minors which each have their role. If we had a reality graph we may choose to move in majors along it or some other scale. then if we wanted to ramp this up we would perform chromatic progressions where we borrow notes or chords from different scales. We could be even more detailed and choose to move in such a way that in one scale we have moved in 4ths perhaps but in another we really moved in 5ths, then reverse that. treating reality as a graph lends itself to its own type of music theory if we consider the circle of fifths and the chromatic scale as a type of graph.

So it's not enough just to move in orthogonal steps. There is a world of nuance and detail that can be added to create more and more pleasing progressions. just as music producers can be better than others based on their use of orthogonality, an agent that treats realitys graph as a music instrument would think much more advanced if it implemented all the tricks available to it. So if you have a thought. The best next thought should be orthogonal and if it isn't then you are setting yourself up for a greater resolution later on where you will finally go orthogonal to both thoughts at once in a large step. In order to go orthogonal to this there are plenty of options . the more sophisticated ones could be said to use all the bells and whistles of graph "music" theory. 

Ramin Barati

unread,
Jun 12, 2021, 9:23:43 AM6/12/21
to opencog, link-grammar
Thank you for sharing your work. I confess that I am not that versed in mathematics, but I'm working on something for my PhD in the same line as to what you propose here and was wondering if my initial understanding of your proposal is correct. IMO, my proposal would go under the "How to do this quickly, efficiently, effectively?" category. My proposal is that I want to approach the problem in an axiomatic way. I am under the impression that the term "axiomatic" is very controversial, so allow me to elaborate.

So what do I mean by axioms? I certainly am not talking about axioms in the sense of "facts". For example, "the thigh bone is connected to the hip bone" is not an axiom in my mind. Another non-example would be the "Resolution" law in logic. What I am looking for is the axioms by which we define what constitutes the reactions of an "intelligent agent". For example, "An intelligent agent should be stable in its decision making" is an axiom in my view. Actually, this is more than an example and I want to propose this as the first axiom of an "intelligent agent". To put it in more precise terms, "there should not be adversarial examples for an intelligent agent". Think of a mosquito, it is a very simple intelligent agent that follows the sources of light (or heat, I'm not sure). But it is not possible to produce an adversarial pattern of light (e.g. rapidly blinking light) to convince the mosquito to NOT follow the light source.

At the first glance, this axiom is not that helpful. But if we get mathematical, it turns out that this axiom has profound consequences. I am planning on uploading a paper on this topic in arxiv in the coming days. To be able to continue the discussion though, I will put some of the math here and am looking forward to getting your opinion on it.

To put the axiom in mathematical terms, I propose a concept that I call "local robustness". An intelligent agent is locally robust if the magnitude of the change in its decision is independent of the direction of an infinitesimal change in the input. To get a feel of the rationale behind the definition, consider an image and its label. If I introduce some small perturbation to the image, the change in the confidence of the label an intelligent agent assigns to the perturbed image should be independent of the "content" of the perturbation. In math form:
local_rob.PNG
The definition could be readily generalized to complex vectors. From complex analysis we know that a function would satisfy the local robustness condition if and only if it satisfies the Cauchy-Riemman equations. I should mention that CR equations are defined for a scalar input, but it is straightforward to generalize this to complex vectors. The functions that satisfy the homogeneous CR equations are very special and they are called "holomorphic" functions. These functions have other desirable properties as well. A complex-valued function is analytic and complex differentiable if and only if it is holomorphic (it's more nuanced than that, but I will omit the discussion for brevity). It is proved that only complex-valued functions of a complex vector could be holomorphic. In other words, it is impossible to construct a holomorphic function of a real vector. So the first consequence of the axiom would be "an intelligent agent uses a complex-valued function of a complex vector for decision making". I can go on about the other consequences of the axiom, but bringing them up would be futile if you find the approach not satisfactory or relevant.



Ben Goertzel

unread,
Jun 12, 2021, 1:26:37 PM6/12/21
to opencog, link-grammar

Ramin, setting aside the math which is simple enough, I don't understand the conceptual intuition behind

". An intelligent agent is locally robust if the magnitude of the change in its decision is independent of the direction of an infinitesimal change in the input. "

It seems naively that the magnitude of change in an intelligent system's decision SHOULD depend on the direction of change of its input (even if the change of input is small) ... I can't see why not...

I have thought a bunch recently about the relationship btw cognitive logic and complex-variable algebra as well, see e.g.


ben





--
Ben Goertzel, PhD
http://goertzel.org

“He not busy being born is busy dying" -- Bob Dylan

Linas Vepstas

unread,
Jun 12, 2021, 9:01:52 PM6/12/21
to opencog, link-grammar
Hi Ramin!

It's been a long time since I've heard from you!

On Sat, Jun 12, 2021 at 8:23 AM Ramin Barati <rek...@gmail.com> wrote:
Thank you for sharing your work. I confess that I am not that versed in mathematics, but I'm working on something for my PhD in the same line as to what you propose here and was wondering if my initial understanding of your proposal is correct. IMO, my proposal would go under the "How to do this quickly, efficiently, effectively?" category. My proposal is that I want to approach the problem in an axiomatic way. I am under the impression that the term "axiomatic" is very controversial, so allow me to elaborate.

Let me clarify this controversy-- it's mostly due to a mis-use of terminology.

First: The word "axiom" has a very specific technical meaning in the mathematical disciplines of proof theory, model theory and universal algebra. (These are all topics you should study. They are important for AGI.) That definition is NOT controversial in the slightest; it's perfectly ordinary textbook stuff. One neat trick you can do:  you can "invent your own axioms" as you please (roughly speaking; there are some "common sense" demands, like that they should be consistent.)  To some large degree, you can think of axioms as being the "grammar" of a "language" -- you can invent a grammar, and languages follow. (You already know and understand this: all programmers know this: the "Backus Naur Form" (BNF) in IETF specs is just a grammar. That's all.)

The only controversy is extremely narrow and obscure: In the early days of AGI, Pei Wang invented something he calls NARS - Non-Axiomatic Reasoning System. (a better name might have been "non-monotonic reasoning system, but whatever). Our good buddy Ben Goertzel pointed out that NARS violates the axioms of probability theory (aka the axioms of "measure theory", aka Kolmogorov's axioms). Well, I guess Pei Wang already knew this, which is why he called it "non-axiomatic", although what he really did is to introduce a different set of axioms, ones that conflict with probability theory (and so conflict with Bayesian logic, Hidden Markov Models, etc.)  Ben, bless his soul, took the conservative track here, and said "uhh no, probability works pretty well, let's not throw it out the window". (and he's right about that.) Ben (with the help of others) then embroidered standard probability, converting it into PLN -- "Probabilistic Logic Networks".   The controversy that erupted is a conventional one: what is better for AGI: PLN or NARS?

After the passage of 15-20 years, I think we can conclude and say PLN is more right than NARS, but also that PLN has it's limitations, difficulties and problems, and does not provide a complete framework.  It's OK, but it has languished in obscurity because there are other ways of doing things, and some of those other ways work very well -- e.g. neural nets.

To conclude, there's nothing controversial about "axiomatic", except for the now-historical battle of PLN vs. NARS.

I don't know how to respond to the remainder of your email. Let me quickly say 3 or 4 things.

-- First let me clear up one thing: the blog post I wrote is introducing a certain theory. The "theory" I'm proposing is rather specific, and can be converted into working software. I am looking for anyone who can help  with that. Unfortunately, it is quite complex, which makes it very hard to  understand and work with. I have experienced extreme difficulty in trying to explain it to anyone.

Changing gears completely:

-- About holomorphic functions. There are 3-4 things you should know.  The general theory of holomorphic functions is called the theory of Riemann surfaces. It is absolutely wondrous and fantastic; it's a vast mathematical playground. There are many good books on it. You should know that Riemann surfaces are among the foundations of string theory. Riemann surfaces provide an excellent introduction to almost all areas of modern mathematics.

-- Complex variables are deeply connected to probability theory.  In probability, there is something called "Fisher Information ''.  It's rather boring; it is conventionally abused by medical doctors when publishing results from medicine or psychology or pharmaceuticals.  The Fisher information can be expressed as a "metric", in the narrow mathematical sense of "metric": a distance. It has a horribly complicated form. Yecch. Turns out, if you rewrite that metric in terms of the square-root of the probability, it becomes the metric of flat Euclidean space.  Egads! I find this to be stunning. Because flat euclidean space is so... uhh... simple. 

In a certain sense, it's like saying that everything is better if you work with the square root of the probability. Now where have I heard this before? Oh, right: Max Born, 1926. Turns out he made this correction *after* the paper was submitted to the printers!  Yow! These days, we call it "Born's rule", or more generally "quantum mechanics". Roughly speaking, QM is nothing more than the square root of probability, times a complex phase.  Complex numbers ride again.

-- You mentioned "vectors". If you do not know what a tensor is, you should learn. A 2-tensor is just a matrix (a square of numbers)  A 3-tensor is a cube of numbers, etc.  Tensors can be combined in a certain way, this is called a "tensor algebra".  The axioms of tensor algebras form a category, the Tensor category. All categories have an "internal language", called "linear logic"  (Up above, I said that axioms are like grammar rules. Grammar rules generate languages, the grammar of the tensor category is linear logic.)

-- Linear logic is like cartesian logic, but oddly different: linear logic includes constructs that describe semaphores and mutexes. You heard that right: the stuff of computer science.  By the way: what is the language of cartesian logic? Well, its .. lambda calculus! Like .. LISP and scheme.  BTW, conventional probability is effectively a cartesian category, while it's square root is linear logic. Ain't that something?  This may sound like crazy-talk, but it's actually well-known: it is a generalization of Curry Howard correspondence.

-- Now, I said the word "tensor" -- tensors are, in a certain sense, symmetric: the tensor category is a symmetric category. What happens if you make it asymmetric? You get, lo and behold, natural language. Like, human language. That thing that you and I use to talk to each other.   These non-symmetric tensors are *exactly and completely the same thing* as Link Grammar "disjuncts".

-- People who first study Link Grammar stumble over the & symbol used to construct disjuncts from connectors, and the word "or" used to combine disjuncts. They often think these are boolean and/or. No! They are not! They are the and/or of linear logic.  Whereas boolean logic only applies to Cartesian categories!  (that is why intersection/union of set theory look like intersection/union in probability theory - Bayes theorem,  etc. -- these are all cartesian. They use classical logic: true, flase, and, or not.)  By contrast, natural language is a fragment of linear logic: that theory which, cough cough, looks like it works with the square root of probabilities. At least, that's what it looks like when we call it "quantum mechanics".   Now, this does not quite mean that "natural language has complex numbers in it", but it sure comes darned close to being that.

If you don't understand what I am saying above, that's OK. It's very complex and abstract. There's a lot to learn. I strongly recommend reading the book called "category theory for computer programmers".  I also strongly recommend reading Bader & Nipkow "Term algebras and all that" - its about basic computing. If this feels too advanced, you MUST read Hopcroft and Ullman: Introduction to Automata Theory, Languages, and Computation. Read the first edition. Do NOT read the editions with Aho as co-author. (Those versions are... sucky). And continue reading about holomorphic functions, but if you go that way, you MUST expand to include Riemann surfaces. Riemann surfaces explain what the complex square-root, and the complex logarithm are, and complex functions in general. Its a very important topic.

-- Linas

Ramin Barati

unread,
Jun 13, 2021, 6:27:52 PM6/13/21
to opencog, link-grammar
Ben and Linas, Hi

Right now I am trying to provide for myself a stable job and to set my foot on a firm ground financially. I think that I am living that part of a man's life in which one needs to bear the fruits of his first endeavors. The geopolitical situation in the middle-east and especially Iran is of no help though. Nevertheless, I am always interested in the discussions in the mailing list and try to follow them as much as possible.

It seems naively that the magnitude of change in an intelligent system's decision SHOULD depend on the direction of change of its input (even if the change of input is small) ... I can't see why not...

Let me try another rationale and see if I can convince you. I'll borrow the vocabulary of your paper, even though I don't think that I have formed a working intuition of the concepts in your paper right now. First, suppose that eta and zeta are parallel. I think it is natural to expect that the magnitude of the change should be independent of the direction in this case. In other words, the change in the magnitude (confidence) of the output should not differ if I amplify or diminish a pattern (evidence), because otherwise it would mean that one of the directions carries more information which does not make sense IMO. So, we can deduce that if we consider all the directions, the change in the output (not magnitude of the output) should have a symmetrical shape and that shape should be an ellipse. The local robustness condition asserts that of all the possible ellipses, a circle is the most stable. In other words, a holomorphic function is maximally robust in a neighborhood of a point because either all of the directions are adversarial or none of them. That doesn't mean that a maximally robust decision function is also a good decision function from the perspective of accuracy. As a matter of fact, since there exists a maximum modulus principle for holomorphic functions, they cannot represent a "closed" decision boundary. I'm using a "closed" decision boundary to represent decision boundaries that do not pass from the point in infinity (like a circle). On the other hand, an "open" decision boundary would pass from the point in infinity (like a line). But I'm guessing that relaxing the condition from a holomorphic function to a meromorphic function would be sufficient to represent any decision boundary without sacrificing much of the stability these functions provide.

On another note, I also have been reading about quantum probability recently. While the subject is certainly out of my reach right now, I think that I have found something interesting that I would like to share with you and Linas and ask if you see any potential there. Before that I would like to give my thanks to Linas for introducing the reading materials on these subjects and to tell you that I would surely look them up. On the subject of Riemannian surfaces, I had a hunch that the subject is important but I lack the math to read the literature. I figured that I need to get a better understanding of vector fields and geometric algebra and I am reading a book called "Geometric Algebra for Computer Science". I would be glad if you could suggest an introductory book on the subject of Riemannian surfaces itself.

The idea is that the output of a classifier is a quantum probability distribution. So a classifier is something like a Dirichlet process but for quantum probability distributions. The output of a k-class classifier is a pure complex antisymmetric k-by-k matrix and using matrix exponential we can map that matrix to a matrix in SU(k). If we normalize the SU(k) matrix in a similar vein to the softmax function, then we would have a probability distribution on the Lie algebra of SU(k) and we can compute the mean, variance and covariance of the output using simple linear algebraic operations. The construction could be further motivated by the fact that the normalized SU(k) matrix could be interpreted as the parameters of a characteristic function similar to the characteristic function of a multinomial distribution.

Jon P

unread,
Jun 14, 2021, 3:42:44 PM6/14/21
to opencog
Hi Ramin :)

Is it ok to ask a question about this? No problem if you don't have time to answer.

"An intelligent agent is locally robust if the magnitude of the change in its decision is independent of the direction of an infinitesimal change in the input."

Say I am on a beach and there are a number of ice cream stands. I create an agent who's job it is to take my position and outputs the direction to the nearest ice cream stand. If I am equidistant between two stands won't I see large changes in the output based on small changed in the input? For instance if I take a tiny step to the left I am pointed to continue left, and if I take a tiny step to the right I am pointed to continue to the right?

In more mathematical terms I think all Holomorphic functions are continuous? Whereas in this situation wouldn't an agent want a discontinuous map made of several "attraction basins"?

Would you see a similar issue in the "travelling salesman problem" where if you moved one of the cities a little you might see a radical change in the shortest overall route?

Does that make sense, have I understood what you mean correctly?

Thanks, Jon.

Linas Vepstas

unread,
Jun 14, 2021, 7:00:49 PM6/14/21
to link-grammar, opencog
Hi Ramin,

On Sun, Jun 13, 2021 at 5:27 PM Ramin Barati <rek...@gmail.com> wrote:

Right now I am trying to provide for myself a stable job and to set my foot on a firm ground financially. I think that I am living that part of a man's life in which one needs to bear the fruits of his first endeavors. The geopolitical situation in the middle-east and especially Iran is of no help though. Nevertheless, I am always interested in the discussions in the mailing list and try to follow them as much as possible.

Food, a place to live, income, savings are vital. Spread the word. Talk to people with political views opposite of your own. Befriend them, even. Talk them over to your side. Gently; don't get shot. Geopolitics should not prevent you from being a good and active citizen.

On another note, I also have been reading about quantum probability recently. While the subject is certainly out of my reach right now, I think that I have found something interesting that I would like to share with you and Linas and ask if you see any potential there. Before that I would like to give my thanks to Linas for introducing the reading materials on these subjects and to tell you that I would surely look them up. On the subject of Riemannian surfaces, I had a hunch that the subject is important but I lack the math to read the literature. I figured that I need to get a better understanding of vector fields and geometric algebra and I am reading a book called "Geometric Algebra for Computer Science". I would be glad if you could suggest an introductory book on the subject of Riemannian surfaces itself.

The more you can read, the better. I would normally recommend "Compact Riemann Surfaces" by Jurgen Jost. It's a Springer textbook. If you look hard enough, you can find a PDF online. My only concern is that it might be a bit too advanced for you.  Try it anyway, see how far you can get. Skip the proofs, on first reading.
 

The idea is that the output of a classifier is a quantum probability distribution. So a classifier is something like a Dirichlet process but for quantum probability distributions. The output of a k-class classifier is a pure complex antisymmetric k-by-k matrix and using matrix exponential we can map that matrix to a matrix in SU(k).

Yuck. Stop right there. I know that you don't know the theory of Lie algebras, so down this path you will only find trouble and flawed thinking.

In grad school, I had a professor, P.G.O. Freund, and one day, instead of lecturing, he went on a tirade. I did not like it much, it felt like a waste of my time. It took me 2-3 decades to understand what he was saying. I hope it won't take you that long.

He drew three symbols on the blackboard: the delta, the nabla and the D'Alembertian (a square). He said: "The people who use a nabla are like that symbol - precariously balanced on its tip, using a tiny amount of knowledge at their base, to reach up into the clouds to explain everything. You don't want to be like that. Stay away from people like that. They are no good.  The people who use the delta have a broad base of knowledge, and a sharp pointy tip: they can use their extensive base knowledge to make precise, pointed observations. You want to be like that.  The people who use the D'Alembertian are the best: not only do they have a proper foundation on which to build, but they are able to accomplish many things with their knowledge."

See what the problem is? I thought to myself "I came to class to hear about this? What a waste of time!" -- but he was right. It took me a few decades to develop a broad base of knowledge. Alas, I am now old, as I mis-spent my youth. If you want to be good at stuff, read widely. But, more importantly, establish a firm foundation. Study the basics. Careful getting tangled in fancy-pants theories before you first have complete mastery of the basics.  Once you know the basics, the fancy stuff will then come easily, and quickly, without a struggle.

I listened to another famous mathematician proclaim that research should be like paddling a canoe: mostly a leisurely paddle down-stream, with occasional furious paddles upstream. (Maybe this was Raoul Bott? I don't recall.)

-- Linas

Ramin Barati

unread,
Jun 15, 2021, 12:47:20 PM6/15/21
to opencog
Hi Jon,

Is it ok to ask a question about this? No problem if you don't have time to answer.

Of course :) 

Say I am on a beach and there are a number of ice cream stands. I create an agent who's job it is to take my position and outputs the direction to the nearest ice cream stand. If I am equidistant between two stands won't I see large changes in the output based on small changed in the input? For instance if I take a tiny step to the left I am pointed to continue left, and if I take a tiny step to the right I am pointed to continue to the right?

Well, In this scenario the large change in the output is the side effect of "making" the decision and it is not a property of the function that is used in the decision making process. The difference is like getting the "output" of a classifier versus the "prediction" of a classifier. When you are getting the output of the classifier, you would get a real number between -1 and 1, but if you want to get the prediction of the classifier you would take the sign of the output which is either -1, 0 or 1. The scenario here is more like getting the prediction of your agent than observing its output IMO. I think the problem would go away if you consider the confidence of your agent as well. The confidence of your agent should not change drastically if you take an infinitesimal step in either direction and should be at its minimum (which would be 0 in the case of a classifier). If you take a tiny step to the right of the equidistant point and suddenly your agent becomes 100% sure that you should go right, I would say that your agent is not stable.

In more mathematical terms I think all Holomorphic functions are continuous?

Have you considered functions with branch cuts like the square root or the complex logarithm?
 
Whereas in this situation wouldn't an agent want a discontinuous map made of several "attraction basins"? Would you see a similar issue in the "travelling salesman problem" where if you moved one of the cities a little you might see a radical change in the shortest overall route?

I have not taken this approach, but I would give you the best answer that I have; which is fueled by wikipedia articles. So don't take it seriously! :p

If I want to analyze the stability of an agent from the point of view of "attraction basins", I would say that an intelligent agent would partition the input space into 3 regions. The stable region (the Fatou set), the chaotic region (the Julia set) and the essentially singular region (the Baker set). I am using the terminology of "complex dynamics" here in case you're wondering. So, if I suppose that my agent is a classifier, the Fatou set would be the regions of the input where the classifier gives the correct label and is robust. The margin of the decision boundary would be the Julia set; even though the output of the classifier is not stable in this region that does not mean that it has adversarial examples there. The output SHOULD be unstable in this region. Then there is the Baker set in which an essential singularity exists. I'm not sure but I think in this region the classifier would output all the possible labels infinitely many times due to the Picard's theorem:

Great Picard's Theorem: If an analytic function f has an essential singularity at a point w, then on any punctured neighborhood of wf(z) takes on all possible complex values, with at most a single exception, infinitely often.

 So in the case of TSP, I would say that it depends on whether the graph is in the Julia or the Baker set of the agent, but I have already stretched my knowledge dangerously far enough. I will follow the advice of Linas and stop giving you my baseless opinion! :D

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages