Introducing myself and talking about metamath

96 views
Skip to first unread message

Jon P

unread,
Jan 6, 2021, 6:47:40 AM1/6/21
to opencog
Hello everyone, my name is Jon, I have a PhD in mathematics (partial differential equations) and some experience as a programmer for a self driving car firm. OpenCog seems awesome, my health is not so good so it's often hard to find nice ways to contribute but maybe there is something and I hope some encouragement is worth something :)

I have been part of the metamath community for a while and wanted to ask, are there any links between the OpenCog community and the metamath community? Do you already know a lot about formal mathematics and are including it in your system or are you taking a different approach?

For anyone who has never heard of it here are a few reasons I am excited about the approach metamath is taking.

Firstly the metamath database is growing. It has it's own syntax to encode mathematical statements and the main database contains about 30k proven theorems based on ZFC set theory and is about at an advanced undergraduate level at the moment. It is in somewhat of a graph bases structure, where each proof is a list of steps, each of which were themselves proven previously. This may mean it has some compatibility with the atom based hypergraph approach, though I can't claim to really understand that.

Secondly a core idea of mm development is that the software which operates on the database is totally separate from the database itself (in contrast to Lean and other formal systems). So there are a bunch of verifies, which can check the validity of the encoded proofs, and people are writing new ones all the time. This makes it easy to hook into the database which is just stored as an ascii text file. There is some comparison of formal systems here, which maybe is not so kind to metamath ha ha.

Thirdly there has been some nice progress recently in AI proof generation. OpenAI has been running a project and have created a pretty good narrow AI which is better than me at proving theorems, which is maybe not such a big compliment ha ha. This has a nice virtuous circle to it where a larger database makes it easier to prove new theorems which makes the AI more powerful.

There is a tipping point coming I think where the boundary of the mm database (or another formal system) catches up with where cutting edge mathematical research is. I think at that point many professional mathematicians will switch to working formally (it has a huge number of nice advantages, it's basically digitising the field and apply AI to it all at once) and that will be a "big bang" of sorts.

Fourthly Mario Carneiro has been doing some really cool work on formally verifying computer programs. So he has a formal system which sits on top of the x86 architecture and can prove, if the hardware executes the algorithms correctly, that certain functions will have certain properties. He was kind enough to explain a bit of it to me here after a nice talk he did.

Fifthly I am then quite excited in the abstract about where this system is headed. For example an AI which can formally reason about code it is writing and can prove mathematical statements could be very powerful. I think there is some nice approach to the control problem here, something like "new code can only be run if it is proven to obey the same constraints that currently running code is bound by." This maybe creates a way of limiting what an AI can do no matter how clever it becomes.

This is a much more speculative idea but I think it is interesting to imagine a statistical layer feeding in to a formal layer. So for example say a linear regression is applied to some collected data and finds some formula for a relationship it would then be possible to use the metamath framework to reason about this formula, maybe integrate it for example, and from that generate a hypothesis which could then be tested in the world. 

Anyway that's probably enough from me I think. If anyone has questions or would like to talk about any of this stuff I would enjoy that. Otherwise I hope everyone has a great day!

Jon

Ben Goertzel

unread,
Jan 6, 2021, 11:33:55 AM1/6/21
to opencog
Hi, there hasn't been actual work connecting OpenCog w/ ATPs, however
I've been engaged with that community a bit since my son Zar Goertzel
is currently doing a PhD on AI for theorem proving w/ Josef Urban... I
have given talks on OpenCog PLN for commonsense reasoning and
scientific reasoning at the last couple AITP events and Josef gave a
talk at the AGI conference a few years ago when it was in Prague...

My commentary on GPTf is here

http://multiverseaccordingtoben.blogspot.com/2020/09/gpt-f-one-more-funky-experiment-in.html

-- Josef did a more in-depth analysis as he has been engaged w/
similar work prior to (and since) GPTf ...

Metamath looks like an interesting knowledge base, I have looked at it
a while ago but not recently...

One issue with OpenCog-based logic reasoning now (e.g. PLN) is that
it's very slow compared at crisp inference compared to modern ATPs ...
this will be remedied somewhat with the new Hyperon version

https://wiki.opencog.org/w/Hyperon

but still there is a certain slowdown that comes w/ having such a
general representation...

A strategy that interests me is using heuristic/probabilistic/AGI-ish
reasoning in OpenCog to learn "proof sketches" (by which I mean, more
like what a regular mathematician rather than a logician or ATP guy
would call a proof) and then invoke some traditional ATP system like
vampire or E prover to fill in the blanks and make a fully formal
proof ... (see some semi-related thoughts on formal proof sketches
here, https://www.cs.ru.nl/~freek/pubs/sketches2.pdf , there is also
more literature...)

So in short -- definitely of acute interest to some of us (I'm a math
PhD from way back when also) but not a subject of current actual work
focus...

ben
> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/371730ce-2f1f-4ab6-9559-26cef4d847e8n%40googlegroups.com.



--
Ben Goertzel, PhD
http://goertzel.org

“Words exist because of meaning; once you've got the meaning you can
forget the words. How can we build an AGI who will forget words so I
can have a word with him?” -- Zhuangzhi++

Nil Geisweiller

unread,
Jan 6, 2021, 3:54:12 PM1/6/21
to ope...@googlegroups.com, Jon P
Hi Jon,


you've come to the right place, yes we are super psyched about combining
AI and ATP, even though it's been underexplored in the context of
OpenCog, we've only scratched the surface! Personally the furthest I've
gone is proving some trivial SUMO theorems, see
https://github.com/opencog/pln/tree/master/examples/pln/sumo. Beside
that we've (me with Ben's input) done some really cool experiments on
inference control meta-reasoning, basically using reasoning (or rather
an efficient specialized form thereof tailored for pattern mining) to
speed up reasoning! See
https://github.com/opencog/pln/tree/master/examples/pln/inference-control-meta-learning.


On 1/6/21 1:47 PM, Jon P wrote:

> Fifthly I am then quite excited in the abstract about where this
> system is headed. For example an AI which can formally reason about
> code it is writing and can prove mathematical statements could be very
> powerful. I think there is some nice approach to the control problem
> here, something like "new code can only be run if it is proven to obey
> the same constraints that currently running code is bound by." This
> maybe creates a way of limiting what an AI can do no matter how clever
> it becomes.


If you don't know the Goedel Machine yet I think you'll quite like it


http://people.idsia.ch/~juergen/goedelmachine.html


Needless to say OpenCog incorporates principles of the Goedel Machine,
in fact the first time I read and understood OpenCog's design I
exclaimed to Ben "[OpenCog] is a workable Goedel Machine!!!".


Nil


Linas Vepstas

unread,
Jan 17, 2021, 3:21:20 PM1/17/21
to opencog, Jon P
Hi Jon,

Interesting email! I have many comments to make, so, inline, below.

On Wed, Jan 6, 2021 at 5:47 AM Jon P <drjonp...@gmail.com> wrote:
Hello everyone, my name is Jon, I have a PhD in mathematics (partial differential equations) and some experience as a programmer for a self driving car firm. OpenCog seems awesome, my health is not so good

A key priority in life is to take care of your health.  Conventional wisdom about regaining health is mostly accurate, assuming you've figured out what "conventional wisdom" is.

I have been part of the metamath community for a while and wanted to ask, are there any links between the OpenCog community and the metamath community?

I've never stumbled over it before.
 
Do you already know a lot about formal mathematics and are including it in your system or are you taking a different approach?

I would like to believe that I know bucketloads of formal mathematics. And bucketloads of mathematics, in general. After a (literally) 15-second survey of metamath, I would say that the big difference is that "formal methods" appears to limit itself to crisp true/false values. By contrast, I'm more interested in probabilistic truth values, and generalizations thereof (time-varying, bounded, with hysteresis, etc. etc.) I also like to think of propositional calculus or sequent calculus or Hilber calculus as being a graphical network:  Each sequent has some inputs, some outputs, and the rules of the game are to connect them up, until you've discovered a "proof". Well, that proof is "just a graphical network of connections between axioms attaining exact true/false values on each leg of each connection". 

So I've attempted to build the AtomSpace as a place to store and connect-up axioms/sequents/assertions/rules with connections that are probabilities/weights/fuzzy-logic values/etc. -- that is, numbers, or number-like things: qubits/homogenous spaces/etc. 

If you study neural networks, you can see that they are densely connected networks, with nodes, and almost all weights between almost all nodes being non-zero. If you study formal mathematical proofs, you can see that they are extremely sparse networks, where every node is connected to only 1 to 3 or 4 others, where the weights are exactly true/false/0/1.  If you study natural language, and biochemistry and many other natural phenomena, you find a scale-free network that is neither dense, nor is it sparse, but somewhere in the middle.

I am deeply interested in converting time-ordered expressions of that network into the underlying structure. (and back). So, by analogy: a seismologist, all they have are some time-series recordings of Earth's vibrations; from that they try to reconstruct the structure of Earth. I have a time series of words, I want to reconstruct the structure of the brain that wrote those words.  And, once reconstructed, what else might that "brain" have said? Just like the Earth model: what other kinds of earthquakes might it produce?

I've got half-a-dozen PDFS all 20 to 100 pages long, that spin out each of the above paragraphs into great detail. I think they're important, but I can't get anyone to read them :-) So it goes...


For anyone who has never heard of it here are a few reasons I am excited about the approach metamath is taking.

Firstly the metamath database is growing. It has it's own syntax to encode mathematical statements and the main database contains about 30k proven theorems based on ZFC set theory and is about at an advanced undergraduate level at the moment. It is in somewhat of a graph bases structure, where each proof is a list of steps, each of which were themselves proven previously. This may mean it has some compatibility with the atom based hypergraph approach, though I can't claim to really understand that.

So what I wrote above "explains" the atom-based hypergraph. It is good at what it does, but, for special cases, such as crisp logic, it is not efficient representationally or computationally (RAM and CPU use).  (Hey, its *also* inefficient for neural network-type computations!) The good news is that it does have a natural layering or "abstraction" mechanism that does allow it to "wrap" hyper-efficient proof engines (or neural net engines).

I've wrapped a few things here and there. I have not daydreamed about wrapping formal systems, but have not found any compelling reason to do so. I strongly encourage others to do so. (Well, I strongly encourage planning and strategizing, first. Good plans results in fewer disappointments.)

Secondly a core idea of mm development is that the software which operates on the database is totally separate from the database itself (in contrast to Lean and other formal systems). So there are a bunch of verifies, which can check the validity of the encoded proofs, and people are writing new ones all the time. This makes it easy to hook into the database which is just stored as an ascii text file. There is some comparison of formal systems here, which maybe is not so kind to metamath ha ha.

These are software engineering questions. Of course, it is always better to be modular, unless your modular design runs 10x slower or is 10x more bloated than the integrated design. But this is an orthogonal discussion, and is, in some ways, less interesting than the "meta" discussion of "what are you actually trying to accomplish?" 

Thirdly there has been some nice progress recently in AI proof generation. OpenAI has been running a project and have created a pretty good narrow AI which is better than me at proving theorems, which is maybe not such a big compliment ha ha. This has a nice virtuous circle to it where a larger database makes it easier to prove new theorems which makes the AI more powerful.

There is a tipping point coming I think where the boundary of the mm database (or another formal system) catches up with where cutting edge mathematical research is. I think at that point many professional mathematicians will switch to working formally (it has a huge number of nice advantages, it's basically digitising the field and apply AI to it all at once) and that will be a "big bang" of sorts.

Fourthly Mario Carneiro has been doing some really cool work on formally verifying computer programs. So he has a formal system which sits on top of the x86 architecture and can prove, if the hardware executes the algorithms correctly, that certain functions will have certain properties. He was kind enough to explain a bit of it to me here after a nice talk he did.

Fifthly I am then quite excited in the abstract about where this system is headed. For example an AI which can formally reason about code it is writing and can prove mathematical statements could be very powerful. I think there is some nice approach to the control problem here, something like "new code can only be run if it is proven to obey the same constraints that currently running code is bound by." This maybe creates a way of limiting what an AI can do no matter how clever it becomes.

I was thinking of replying to these in detail, but I think I already explained that "there's more to life than crisp true-false values" and that data is more general than simply networks of propositions and assertions, axioms and deduction rules.

Some of my thinking on these matters is informed by ergodic theory, the theory of phase transitions in physics ... stuff like spin glasses (I can make a rough argument/analogy that says human society is like a spin glass. This is not meant to be a casual observation, but rather a statement about how neurons interact with one-another, forming thightly-bound, high-bandwidth, low-latency "human brains", and how "human brains" then interact with each other using low-bandwith, high-latency writing, speech, radio, television, and most importantly, social media. And that the network connectivity of human brains via social media is a change-of-phase as compared to mass media, which explains the current political turmoil.  We have entered a fundamentally different era of inter-human-brain connectivity. 

So I am interested in modelling networks, in general: scale-free or hubby, centralized or decentralized, federated or not, exchanging messages that are more than binary true-false values. And then, like the neuroscientist sticking wires into the brain, I want to take time-series samples of that network, and reconstruct what that network "is thinking", what that network "knows".

The networks formed by axioms and rules of deduction are just one very special case. A very important special case, but very remote from the other special case that storms capital buildings and makes facebook posts about it.  They're both networks. I'm interested in the networks.  (Sigh, to dot my i's and cross my t's -- molecular interaction between DNA, Proteins is also a network.  An absolutely brilliant example of this can be found at https://algorithmicbotany.org where Professor Przemyslaw Prusinkiewicz has taken a 3-4 decade journey, starting out with state machines (crisp logic, grammatical rules) and stapling on the diffusion-reaction equation. 

You're a PDE guy. You know the diffusion-reaction equation. Now, imagine coupling one of the fields to a (deterministic) finite-state machine.  What happens? The answer is "biology", or, at least, "botany". The "deterministic machine" is, roughly speaking "DNA", and the things that diffuse are the various proteins that are expressed. That's very very rough, but gives the idea.

So: can I take a deterministic theorem prover, which discovers "facts", and attach them to a differential equation that models the diffusion of knowledge in human society? What happens when I do this?  As that knowledge diffuses from machines to human brains, and back to machines, how does this change the priorities of what the theorem prover is working on? How does the man-machine symbiosis work?
 

This is a much more speculative idea but I think it is interesting to imagine a statistical layer feeding in to a formal layer. So for example say a linear regression is applied to some collected data and finds some formula for a relationship it would then be possible to use the metamath framework to reason about this formula, maybe integrate it for example, and from that generate a hypothesis which could then be tested in the world. 

Anyway that's probably enough from me I think. If anyone has questions or would like to talk about any of this stuff I would enjoy that. Otherwise I hope everyone has a great day!

I eagerly anticipate further discussion!

-- Linas 

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

Linas Vepstas

unread,
Jan 17, 2021, 3:30:17 PM1/17/21
to opencog, Jon P
I should proof-read my emails, first:

On Sun, Jan 17, 2021 at 2:21 PM Linas Vepstas <linasv...@gmail.com> wrote:

I've wrapped a few things here and there. I have not daydreamed about wrapping formal systems, but have not found any compelling reason to do so. I strongly encourage others to do so. (Well, I strongly encourage planning and strategizing, first. Good plans results in fewer disappointments.)

 I meant to say "I have daydreamed about wrapping formal systems". High on my list is ASP - answer-set-programming (potasco, the potsdam solver) but certainly, other solvers, HOL systems, metamath are all reasonable candidates, and really, the question has to be "great! so what? now what do we do?"  I have provisional answer to that, but maybe later.

-- Linas

Jon P

unread,
Jan 17, 2021, 4:59:46 PM1/17/21
to opencog
Firstly thanks for the nice responses, it’s kind of you all to spend time writing to me. 


Ben it’s a really interesting point that ATP is solved in the abstract but just computationally intractable, I suppose it’s a bit like GO, it can be solved by breadth first search, given enough time.


Another interesting question it got me thinking about: what mathematics should an AGI pursue? For example it would be possible to set a formal system going and get it to construct arbitrary theorems by applying rules to existing theorems, so why are some theorems more interesting than others? I think you touch on this in an interesting way Ben when you talk about some lemmas being more useful than others for proving other theorems, that’s fascinating, and maybe circular. I guess being motivated by science is a good forcing function for the direction the system should take.


I agree with your point about how GPT systems can sort of end up like the ravings of a genius lunatic, sometimes precise and clear and sometimes total nonsense. However I think the ATP setting is rather different because the output of the system can be verified algorithmically.


So when GPT-3 makes up a human sentence it can’t ever know how good it is, it’s on a grey scale from nonsense to Shakespeare. Whereas if GPT-F produces a proof either it is correct or not and this can be checked. So in a way the fact that a lot of what it produces is nonsense isn’t a difficulty. In fact might it almost be a strength? A lunatic who has strokes of genius might, in the long run, be extremely creative and inventive, someone who never says anything dumb might be quite restricted in what they produce.


Also talking of how metamath is “low level” and doesn’t use tactics much, aren't the previously proven theorems sort of the abstractions? For example in metamath you can press a button to see the entire proof only as a list of axiomatic substitutions, this is almost like the binary code of a mathematical proof. A theorem is essentially some big string of axioms which have been compacted into a high level statement, moreover with slots so it is modifiable.


So just by proving theorems isn’t the system moving up layers of abstractions? Isn’t the database itself a stand in for “understanding” and “sketch proofs” where a system with a big database already can produce proofs more effectively than one which had no database and was only restricted to using the axioms? Isn’t that the knowledge and understanding which is growing over time?


I hope I have explained that reasonably, I tried to understand your nice piece as best I could.


Nil that is really interesting about Godel machines. The idea of the machine being able to prove things about its own code is very interesting I think. As I mentioned Mario Carneiro in the metamath group is building a system which can formally prove things on the x86 architecture so there is a lot of overlap between that and the tools the Godel machine would need I think. Sounds like you have made some nice progress here too. I also really like the sound of it from a control problem perspective that “only adapt if a new approach is provably better” gives some hope the system can be restrained.


I wondered about trying to make an AI system which moved a bishop towards a goal on a chessboard and was allowed to modify it’s movement algorithm on condition that it could prove the bishop would never step on a white square. I felt like this might demonstrate something interesting about the control problem.


Linas, everything you are saying around the atomspace and networks with probabilistic connections is interesting. I wonder if there is some connection with “conjectures”, which maybe are a theorem which is only probably true and not binary true? 


I am afraid the link to algorithmic botany did not work so well for me, maybe the site is down? It sounds super interesting though, I would be interested to learn more about the foundations of biology like that. 


Thank you all for taking the time to respond. It has taken me quite a while to look at the things you have sent so if you respond again, and please don’t feel any pressure to, it might take me a while to get back to you, and I might not manage it. I don’t mean to be rude it’s just that there is a lot, of very interesting things, to think about here. 

Linas Vepstas

unread,
Jan 17, 2021, 10:08:15 PM1/17/21
to opencog
Maybe I mistyped the URL. Try http://algorithmicbotany.org/ or just type it into a search engine. Hit "papers" to get to the meat of it. http://algorithmicbotany.org/papers/  Instead of looking at the latest papers (from 2020), it might be more educational to start with the oldest ones, first. Those from the 1980's, and then move forwards. The early papers might strike you as trite or trivial; the seeming simplicity is misleading. They set up a plot that is revealed only later.

Again, regarding theorem proving: logic is built on the foundations of grammar. Natural language is built on the foundations of grammar.The 3D structure of biological molecules is built on a foundation of grammar (aka "chemistry" aka "atomic physics").  Proofs are expressions of grammars; theorem provers generate proofs from grammars; but equally important is the converse: what laws and rules can we deduce, when working in the opposite direction? For example, the axioms of algebraic topology (Eilenberg-Steenrod) are not "god-given"; they arise from the study of the structural properties of spaces.  Once you have these axioms in your pocket, sure, you can generate and prove theorems with them. But who created the axioms? Well, Eilenberg... and Steenrod... ...and many many others.  Human mathematicians swarmed over the problem space like ants, exploring every nook and cranny. 

Constructing proofs is only half the work that mathematicians do, and, as we now understand that theorem-proving is a mechanical, algorithmic process, it is indeed something that machines might reasonably excel at. The other half of what mathematicians do is to formulate theorems that are interesting .. to other humans. This turns the process of theorem-proving on it's head. Given properties of topological spaces, what are the *interesting* things one can say about them? Why are we, as humans, interesting in K-theory and cobordism, instead of (for example) what you might get if you drop additivity or exactness?

I call this last process, the process of extracting a grammar from a sampling of the "physical" universe. Once you have that grammar (a collection of axioms and deduction rules), you can feed it into a theorem prover, and out fall a bunch of proofs. Great! But an absolutely key step is to obtain the grammar.  Axioms are not god-given, they are discovered and formalized by humans, because the humans believe that structures they describe, the theorems they allow to be proven, are interesting and important .. to humans.

Neither GPT nor ATP have the ability to discover axioms. Actually, as far as I can tell, the folks working on GPT and ATP are not even aware of this as a conceptual issue, or problem to be solved. GPT does encode grammars in an utterly opaque and non-human-understandable collection of weights in dense graphs.  Insofar as that encoding is accurate, has a high-fidelity, then GPT can achieve "brilliant" results. My goal with opencog is to *not* use dense networks, but to use sparse networks, which *are* human readable, and have the form of collections of ... axioms and rules. Its the third leg of a tripod that, for whatever reason, no one seems to be aware of or acknowledge.  Yes, GPT learns; it is, unfortunately, using the "wrong" representational system.  Its mapping nature, which is sparse, into vector spaces, which are dense. Theorem provers start with rules and axioms, and generate sparse networks (the proofs). Until you bridge this divide, this canyon, this disconnect, you ain't got nothing.  Once you do bridge it, you can say you're finally on the road to general AGI.

-- Linas


--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

Luke Peterson

unread,
Jan 17, 2021, 10:43:56 PM1/17/21
to ope...@googlegroups.com, linasv...@gmail.com, Jon P
Hi Linas,

Please share those PDFs. お願い 🙏

I’ve been searching for a unifying theory that can encompass both formal reasoning systems and neural nets for some time, and I suspect you might have it. Or at the very least you’re much closer than I am.

My project (Hippocampus) was/is a value-flow network that could represent programs. Not unlike Atomspace. I opted for connectivity rules that I called “flux attenuation”. That is to say each linkage could express a value between 0 and 1, and collectively the “conductance” of any path through the graph could be evaluated using Ohm’s law. For example, a CondLink (in Atomese parlance) could be thought of as a semiconductor with the conductance changing depending on the value passing through it.

I had two reasons for this design choice: 1.) I wanted to be able to mix and match subgraphs with predictable results. (first and foremost HC is aiming to be a programming language) and 2.) I wanted to be able to apply Quasi-Monte-Carlo methods (low-discrepancy sequence sampling, e.g. Halton, etc.) to creating a probability-distribution-function, solving an entire graph.

But, without the ability to tweak biases, HC networks are pretty much untrainable, compared with neural networks. At least I haven’t been able to train them to do much beyond some very simple toy problems. Maybe I’m a bad teacher.

HC allows NNs to be embedded inside HC nodes, e.g. a classifier could use a SoftMax to normalize the outputs into attenuation values, but it feels like I’m missing something important.

Thank you.

-Luke


> On Jan 17, 2021, at 3:21 PM, Linas Vepstas <linasv...@gmail.com> wrote:
> ...

Linas Vepstas

unread,
Jan 19, 2021, 5:21:27 PM1/19/21
to Luke Peterson, opencog, Jon P, Amir P
Hi Luke,

(BTW, my middle name, my "Christian name" is Lukas -- Luke; Linas is my "pagan name" -- the linen plant (linseed oil, linoleum, linen cloth -- flax) Anyway...

On Sun, Jan 17, 2021 at 9:43 PM Luke Peterson <luketp...@gmail.com> wrote:
Hi Linas,

Please share those PDFs.  お願い  🙏

See below. More important is perhaps having a conversation which exposes the issues, so that we can focus on what's important.

I’ve been searching for a unifying theory that can encompass both formal reasoning systems and neural nets for some time, and I suspect you might have it.  Or at the very least you’re much closer than I am.

My project (Hippocampus) was/is a value-flow network that could represent programs.  Not unlike Atomspace.  I opted for connectivity rules that I called “flux attenuation”.  That is to say each linkage could express a value between 0 and 1, and collectively the “conductance” of any path through the graph could be evaluated using Ohm’s law.  For example, a CondLink (in Atomese parlance) could be thought of as a semiconductor with the conductance changing depending on the value passing through it.

A foundational concern is "what is a network?" and "how do I represent it?" I take a "network" to be a synonym for a "graph-theoretical graph". There are many ways of representing a graph - a list of vertices and edges, an adjacency matrix... questions include: "is it RAM-efficient?" "is it CPU efficient?" and "is it generic enough to represent generic networks, ranging from disease-spread networks to mathematical proofs, from abstract syntax trees to bio-molecular networks?" 

 I've settled on something called "germs" or more generally "sheaves" -- this is a single vertex, with attached half-edges. (Amir: I cc'ed you because this is more-or-less the same thing as a link-grammar "disjunct")  A nice property of these is that they are uniformly composable: some connected germs still look like a germ; and they resemble elements of syntactical structures, and... more ... this is exposed in a collection of PDF that go into details. The word "sheaf" comes from algebraic topology.

Hmm... what order should you read these in? Maybe start with the blog entry, first...(below)


The vectors (of neural nets) are a special case of tensors. But tensors are more-or-less the same thing as the graphical germs. This is not "deep" in and of itself, but is widely and wildly under-appreciated; it is the corner-stone, the bedrock on which the foundations of neural-nets vs. symbolic representations are laid.

This talks about how seeds/germs can be  used to represent different kinds of data structures: from lambda expressions, to many other kinds of things: https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/connectors-and-variables.pdf

The above PDF's are short; 5-10 pages long. Now for the long ones, which attempt to unify symbolic logic and neural nets. First, a blog entry (its fairly short): 


Hmm. Actually, that blog entry links to the big PDFs, so just go there. 



I had two reasons for this design choice: 1.) I wanted to be able to mix and match subgraphs with predictable results. (first and foremost HC is aiming to be a programming language) and 2.) I wanted to be able to apply Quasi-Monte-Carlo methods (low-discrepancy sequence sampling, e.g. Halton, etc.) to creating a probability-distribution-function, solving an entire graph.

But, without the ability to tweak biases, HC networks are pretty much untrainable, compared with neural networks.  At least I haven’t been able to train them to do much beyond some very simple toy problems.  Maybe I’m a bad teacher.

HC allows NNs to be embedded inside HC nodes, e.g. a classifier could use a SoftMax to normalize the outputs into attenuation values, but it feels like I’m missing something important.

OK, so here's a sequence of remarks...

There are a large number of "training" and "learning" algorithms. Superficially, some of these seem to be very very different than others. However, if you try to compare them, and ask "what properties do they have in common?", you gain a lot of insight.  You gain even more if you disentangle the data representation from the algorithm.  For example, "sparse matrix factorization" has been the primary problem that Amazon, Facebook, Google try to solve. The matrix is "consumers" (for rows) and "product preferences" (for columns) with 10's or 100's of millions of consumers, and millions of products (in some of the competitions, "movie reviews" were a stand-in for "products") This can be approached as a old-school linear algebra problem, where you try to apply some smoothing functions, compute some eigenvectors, interpolate across missing values, etc. I'm tempted to call this "boring old linear algebra" and it is interesting only because a good solution will allow GOOG/AMZN/FB to make billions in profits by targeting me with the right kind of advertisements.

You can also change perspective. Here's a super-quick sketch. a graph can be represented as an adjacency matrix. An adjacency matrix is a matrix whose columns and rows are vertices. The entries are 0 or 1 where 1 means "there is an edge connecting these two".  If you replace the 0/1 with fractional values, then you can interpret this as a weighted graph. If the weights sum to 1.0 you can interpret this as a Markov matrix. Back in the day when google had only two employees, the grand discovery was that you didn't have to put the entire matrix into RAM in order to solve it -- the Page-Brin algorithm solves the Markov matrix problem by paging into RAM only 0.00001% of the graph at a time. 

But the consumer/product-preference matrix is different. Almost all entries in the matrix are unknown. (its impossible for 10 million consumers to express preferences for a million products). The matrix factorization is M=L D R where L and R are sparse matrices, of dimension 1M x 1K, and D is a dense matrix of 1K by 1K. You can solve D with standard neural-net techniques.

Curiously, the natural language dictionaries in Link Grammar have the same structure. Many words have very similar grammatical structure (e.g. "all nouns" vs "all adjectives") -- put these into L. The product D R encodes the grammar. If you look at the dictionary, you will spot that D R is also factorized: R is a collection of "disjuncts" that commonly occur together (for example <b-minus> or <mv-coord> or <verb-rq-aux> in link-grammar) and that the "guts" of the English language lie in D -- which is a dense (not sparse) matrix that interconnects word classes (e.g. "words.n.2.x" which is one of the elements of L, to <b-minus> and <mv-coord> which are the elements of R.)

That task of grammar learning is to perform this factorization: find L and D and R. You can find L and R with Bayesian methods, and D with neural net methods. Or you can use other algorithms, e.g. the algos from the consumer-preference movie-ranking competitions. 

BTW: Link grammar uses "disjuncts" and "costs" but these are really just "seeds/germs" from a "sheaf"; the sampling is a very sparse matrix. This is not a "deep" statement, its "obvious" and "shallow" once you see it, but, for whatever reason, almost no one ever "sees it", and even then, they almost never leverage the power behind it. (The power being that you can move algorithms from one kind of data representation to another.)

OK... here is another change in perspective. When you read the neural-net papers, the vast majority of them talk about "costine distance" or, like you, blurt out "SoftMax" without ever thinking about it. I believe this is a serious error. It is exposed by another change of perspective. Soo..

When one says "linear algebra" or one says "vector" or "matrix", this immediately implies "Euclidean space". That is because vectors and matrices have transformational properties (1-tensor, 2-tensor) that are "natural" in Euclidean space. The cosine product is preserved in euclidean space under rotations. (its a zero-tensor, its a kind-of-like Casimir invariant)

But who ever said that consumer preferences live in Euclidean space? Where does it say that euclidean space is the natural setting for neural nets? Nowhere. Absolutely nowhere.  If you shift focus, and look at the "matrix" (the 2-tensor) not as a linear-algebra thing, but instead as a weighted graph, then you can see other possibilities. My favorite is mutual information (yes, some people are sick of me saying this over and over ...) if you take a neural net algo which has a cos(theta)=(v dot w)/|v||w| where v and w are vectors, and replace it by  MI=log[(v dot w) / (v dot va) (w dot wa)]  (see skippy.pdf for details) you get not a Euclidean space, but a probability space (simplex). The conserved quantity is not the scalar dot product, but instead the sum of probabilities. (This is not my discovery; this is something noted a decade or two ago by a handful of authors; and has been ignored by 99% of the neural net literature. Ignored, not rejected; this does not appear to be a conscious decision, but rather just forgotten/unobserved. Would it improve NN learning? Who knows?)

There are a bunch of these kinds of "changes of perspective", I try to sketch them in the "skippy.pdf" paper. It's all "green field" development: fairly obvious direct and immediate possibilities, angles and approaches that have been overlooked and remain unexplored for whatever reasons.  I suspect the mainstream researchers are FOMO of other discoveries, that they are too busy to explore some really quite promising algos. Kind of a sociology-of-science question. And, just to ding Ben a bit: he keeps telling me that this is all "obvious and trivial" but if it's so obvious and trivial, then why has no one investigated any of these avenues yet? Why aren't people publishing results? Harrumph. 

Anyway, this is where I am currently stalled. One issue is that there are so many different avenues and possibilities that look promising, it's hard to pick just one.  And each avenue requires a lot of work to explore. It's hard to do single-handedly; collaborations are more effective.

-- Linas
 

Thank you.

-Luke


> On Jan 17, 2021, at 3:21 PM, Linas Vepstas <linasv...@gmail.com> wrote:
> ...
> So I've attempted to build the AtomSpace as a place to store and connect-up axioms/sequents/assertions/rules with connections that are probabilities/weights/fuzzy-logic values/etc. -- that is, numbers, or number-like things: qubits/homogenous spaces/etc.
>
> If you study neural networks, you can see that they are densely connected networks, with nodes, and almost all weights between almost all nodes being non-zero. If you study formal mathematical proofs, you can see that they are extremely sparse networks, where every node is connected to only 1 to 3 or 4 others, where the weights are exactly true/false/0/1.  If you study natural language, and biochemistry and many other natural phenomena, you find a scale-free network that is neither dense, nor is it sparse, but somewhere in the middle.
>
> I am deeply interested in converting time-ordered expressions of that network into the underlying structure. (and back). So, by analogy: a seismologist, all they have are some time-series recordings of Earth's vibrations; from that they try to reconstruct the structure of Earth. I have a time series of words, I want to reconstruct the structure of the brain that wrote those words.  And, once reconstructed, what else might that "brain" have said? Just like the Earth model: what other kinds of earthquakes might it produce?
>
> I've got half-a-dozen PDFS all 20 to 100 pages long, that spin out each of the above paragraphs into great detail. I think they're important, but I can't get anyone to read them :-) So it goes...

Reply all
Reply to author
Forward
0 new messages