AGI Discussion Forum

383 views

Skip to first unread message

Ben Goertzel

unread,

Apr 16, 2021, 9:28:41 PM4/16/21

to opencog

Hi all,

We are going to start doing biweekly 90-min discussion sessions via
video-chat, focused on topics related to OpenCog Hyperon...

First one will be April 30, early AM US Pacific time; see

https://wiki.opencog.org/w/AGI_Discussion_Forum

for current and updated info on the discussion forum series...

thanks!
Ben

--
Ben Goertzel, PhD
http://goertzel.org

“He not busy being born is busy dying" -- Bob Dylan

Linas Vepstas

unread,

Apr 17, 2021, 10:49:48 AM4/17/21

to opencog

Hi Ben,

Perhaps a description of what Hyperon is would be appropriate.

-- Linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBdEsuB%2BPzcrUM2_SXDxXF%3Dbn9gruJxsFpO7YLzRWV0EoQ%40mail.gmail.com.

Patrick: Are they laughing at us?

Sponge Bob: No, Patrick, they are laughing next to us.

Ben Goertzel

unread,

Apr 17, 2021, 12:23:09 PM4/17/21

to opencog

Hi, some links on the (still early stage and in work-in-progress)
Hyperon design are here

https://wiki.opencog.org/w/Hyperon

We are aiming to get a more concrete and summarized initial set of
Hyperon design documents ready by sometime this summer. There are
two key aspects here (not in any special order)

1) A distributed Atomspace (which will also, layered on SNet platform,
be the foundation a decentralized Atomspaces)

2) An "Atomese 2" language , consisting of a set of Atom types for use
in the Atomspace metagraph (with the flexibility to specify
sub-metagraphs practically usable as programs, among other
capabilities), plus some syntactic sugar for human use and a
corresponding interpreter

You have seen some of our thinking on Distributed Atomspace, and as I
recall in our last discussion on this you said you had some critiques
in mind but didn't offer them in detail.

Regarding Atomese 2, a bunch of Alexey's notes linked in the above, as
well as some of my recent theoretical papers posted on Arxiv such as
"Paraconsistent Foundations", "Folding and Unfolding on Metagraphs"
and "Patterns of Cognition" are oriented toward fleshing out what type
systems and processes Atomese 2 needs to be able to handle
efficiently. BTW the structures used in "Folding and Unfolding on
Metagraphs" (which are implicitly there in the infrastructure assumed
in the other 2 papers) are closely related to your metagraph
sheaves/germs structures (as explicitly noted in the references to
that paper).

My recent paper "General Theory of General Intelligence" tries to tie
these specifics into broader notions of "what is general intelligence"
... coming both from formal AGI theory (Hutter etc.) and cognitive
sicence

So a major initial thrust of discussion in these AGI Discussion
sessions will be, in essence, "What does an AGI programming language
need to be to support an AGI approach like Hyperon?" ... i.e. because
we are in the process of thinking through what Atomese 2 needs to be.
Obviously all your work on the current OpenCog Atomspace is an
extremely important inspiration and ingredient here (and I note, as a
side comment, we are using current OpenCog in the Awakening Health
project as part of the dialogue and control system for the Grace
eldercare robot...)

-- Ben

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35%3D0ZiFjtmW5dwfs%2BqOZuKwW6Vg69JAEc2DR6vcp%3D68tw%40mail.gmail.com.

Linas Vepstas

unread,

Apr 17, 2021, 2:10:23 PM4/17/21

to opencog

Hi Ben,

Thank you! I am seriously behind on my reading list; I have a pink piece of paper in front of me, right now, on which it is written "read Ben's latest stuff"! It's right up there with much more mundane chores.

Some comments; trying to be short, but failing...

On Sat, Apr 17, 2021 at 11:23 AM Ben Goertzel <b...@goertzel.org> wrote:

1) A distributed Atomspace (which will also, layered on SNet platform,
be the foundation a decentralized Atomspaces)

Shall I repeat again? The atomspace is already distributed, and there are now several different ways in which this works. Actually, N ways, to be precise. I'll describe them below. Again, I invite anyone and everyone to try it, use it, find out if or how it meets or doesn't meet current needs and expectations. (It's come a long ways since the iCog Labs experiments, and explicitly includes feedback from those experiments, Xabush and Kasim in particular; they may not think I am listening to them, but I am, and am taking their comments seriously.).

2) An "Atomese 2" language , consisting of a set of Atom types for use
in the Atomspace metagraph (with the flexibility to specify
sub-metagraphs practically usable as programs, among other
capabilities), plus some syntactic sugar for human use and a
corresponding interpreter

Well, I'm hoping that you don't succumb to an urge to rewrite everything from scratch. I fear that you are tempted to do this. I'm fairly certain I won't be able to talk you out of it, if that is what you choose to do. Let me only caution that the software industry has vast amounts of experience in making this decision, and in most situations, it turns out to be a dire mistake, and in most of the rest, it turns out to be incredibly costly, painful, expensive and many years behind schedule. I mention this because I know that you don't follow generic software development news, so you may think that I am exaggerating, or that it doesn't apply to you. You should talk to some neuro-typical software executives and get a second opinion before you embark on a rewrite.

You have seen some of our thinking on Distributed Atomspace, and as I
recall in our last discussion on this you said you had some critiques
in mind but didn't offer them in detail.

I started reading the document you sent me, but it was full of so many misconceptions and confusions, it was hard to know how to even start critiquing it. It was kind of a case of "throw it out and start all over again". (Oh, the irony) I was (still am) fearful my advice (and any document I write) would be ignored, so there's not much incentive to work hard on that. Life is short.

Perhaps the biggest critique is that a "uuid" is not a central concept, and that uuids are only needed on communications channels. They don't have to be "universally unique", they only need to be unique on that channel. The other critique is that one should never-ever underestimate the cost of communications: sending data from here to there is always expensive, and not just a little, but a lot.

You might not be aware of the AtomSpace RocksDB StorageNode. It is now 10x or 12x faster than the Postgres Node. This is not because postgres is "stupid", but because there is a huge cost associated with any storage medium that uses the network to communicate. Again: the cost of communications is huge. (well, OK, the atomspace postgres node is over-engineered; I think I now understand a slimmer and faster way of doing things. That's a rainy-day project. It rarely rains, here.)

----------------

So. The distributed atomspace -- things you may not be aware of. There is now a new Atom type: (StorageNode "some://example.com/url/where?ever") When you open it, you can send Atoms to that URL, or receive them, or even run pattern queries on the remote system, and get back only the matching results. You can have as many of these open as you want, and so can copy atoms around from one to another. You can close and reopen them at any time, for any reason. Everybody and their grandmother can now run an AtomSpace node, and merrily exchange Atoms and TruthValues with one-another.

There are four supported URL's (and they pass all unit tests, over a dozen unit tests each. The code is not "beta", its "production ready" ).

cog://example.com/

cog-simple://example.com

rocksdb:///path/to/file/on/disk

postgres://example.com/dataset-name?user=foo&passwd=bar

The first two talk to remote atomspaces. The cog-simple variant is a stunningly simple variant. I suppose it's slower under a heavy load, but not much, I don't think. (its full-featured; the implementation is very minimalist.)

The RocksDB node is in the local filesystem. Rocks is facebook's #1 database; it's very highly tuned for performance, and many popular databases, including graph databases, use rocksdb "under the covers" to do the heavy lifting. Basically, it's fast and feature-rich.

The postgres node is built on top of the same-old postgres code as before. The only thing new is that you can now open N of these, if you wish. You can open all of these at the same time, and even move atoms around between them. It works. I've done it. (Hmm. Oddly enough, I haven't written a unit test for that, yet. I suppose I should ...)

---------------------

So, for a basic distributed atomspace, each of your friends can run an AtomSpace on their computers, can trade URL's with one-another, "cog://example.com" "cog://foo.com/" and "cog://bar.com" and then happily have all the atomspaces chat with one-another, shipping atoms around every-which way, running remote pattern matches, etc.

I suppose you might say that you want something fancier than that... sure! You'd have to say what that is. (the aforementioned document did not describe anything fancier than this.) I've got an inkling of possibilities, but this email is too long already.

Till later, take care; hope all is going well, and shout if you need anything.

-- Linas

Ben Goertzel

unread,

Apr 17, 2021, 3:04:54 PM4/17/21

to opencog

Hi,

> Shall I repeat again? The atomspace is already distributed, and there are now several different ways in which this works. Actually, N ways, to be precise. I'll describe them below. Again, I invite anyone and everyone to try it, use it, find out if or how it meets or doesn't meet current needs and expectations.

Cool, yes def. Vitaly & Alexey and co. (who are leading the Hyperon
effort) will check out the current state of the distributed Atomspace
code...

> Well, I'm hoping that you don't succumb to an urge to rewrite everything from scratch. I fear that you are tempted to do this. I'm fairly certain I won't be able to talk you out of it, if that is what you choose to do.

We are trying to approach this decision as rationally as we can, via
thinking pretty hard about requirements first so we can carefully
assess what in the current OC codebase is best rewritten from what
isn't

There is a Hyperon prototype now which is totally distinct code from
current OpenCog, but what relation this prototype ends up having w/
the actual Hyperon system is also currently unclear and up for (not
that far) future decision...

>> You have seen some of our thinking on Distributed Atomspace, and as I
>> recall in our last discussion on this you said you had some critiques
>> in mind but didn't offer them in detail.
>
>
> I started reading the document you sent me, but it was full of so many misconceptions and confusions, it was hard to know how to even start critiquing it. It was kind of a case of "throw it out and start all over again". (Oh, the irony) I was (still am) fearful my advice (and any document I write) would be ignored, so there's not much incentive to work hard on that. Life is short.

Well as you know the distributed-DB aspect is perhaps the part of this
on which I'm least knowledgeable...

I've been throwing myself more into the Atomese-2 part, both looking
at important underlying operations like metagraph chronomorphisms,
etc., and experimenting w/ interestingly related languages like Idris
(Nil is using Idris 2 for the AI-DSL project, which is distinct from
Hyperon though related, and finding it surprisingly performant as well
as conceptually powerful...) and formalisms for gradual typing etc.
....

Anyway I will make sure that any advice you give, or documents you
right, are seriously considered in the Hyperon design process though
of course I can't guarantee your views will be concurred with by all
(there are others involved e.g. Vitaly who also have vastly more
knowledge/experience in the distributed-DB area than I do...)

> Perhaps the biggest critique is that a "uuid" is not a central concept, and that uuids are only needed on communications channels. They don't have to be "universally unique", they only need to be unique on that channel.

This seems an important observation, and (among other points) seems
worth remembering in the context of the goal of making a
distributed-Atomspace-DB that has "Decentralized control and
ownership" in various senses

> You might not be aware of the AtomSpace RocksDB StorageNode.

I'm aware of it and have seen the work on GitHub, but have never tried
it out... (whereas I did actually play w/ the postgres backing store
at one point...)

> So, for a basic distributed atomspace, each of your friends can run an AtomSpace on their computers, can trade URL's with one-another, "cog://example.com" "cog://foo.com/" and "cog://bar.com" and then happily have all the atomspaces chat with one-another, shipping atoms around every-which way, running remote pattern matches, etc.
>
> I suppose you might say that you want something fancier than that... sure! You'd have to say what that is. (the aforementioned document did not describe anything fancier than this.) I've got an inkling of possibilities, but this email is too long already.

There are a lot of fancier things we want for Hyperon, but I don't
currently have any strong opinion regarding whether building them on
top of your RocksDB back-end is a workable approach. Efficient
distributed pattern-matching of an appropriately (but not overly)
broad class of static pattern-matching queries is the main thing.
Some of Alexey's documents linked on that Hyperon wiki page go into
what class of static pattern-matching queries is likely most important
here. But I've run out of time for typing now, perhaps I will pack
that into a later email.

thanks
ben

Abdulrahman Semrie

unread,

Apr 17, 2021, 4:38:25 PM4/17/21

to opencog

> There is now a new Atom type: (StorageNode "some://example.com/url/where?ever") When you open it, you can send Atoms to that URL, or receive them, or even run pattern queries on the remote system, and get back only the matching results. You can have as many of these open as you want, and so can copy atoms around from one to another. You can close and reopen them at any time, for any reason.

It's great to see that the Atomspace supports remote pattern matching! Is there some documentation for the above that I can refer to? Thanks.

Linas Vepstas

unread,

Apr 17, 2021, 5:54:19 PM4/17/21

to opencog

On Sat, Apr 17, 2021 at 3:38 PM Abdulrahman Semrie <hsam...@gmail.com> wrote:

> There is now a new Atom type: (StorageNode "some://example.com/url/where?ever") When you open it, you can send Atoms to that URL, or receive them, or even run pattern queries on the remote system, and get back only the matching results. You can have as many of these open as you want, and so can copy atoms around from one to another. You can close and reopen them at any time, for any reason.

It's great to see that the Atomspace supports remote pattern matching! Is there some documentation for the above that I can refer to? Thanks.

But of course!

Easiest with the demo:

https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-query.scm

A more basic demo of persistence in general: (w/o query):

https://github.com/opencog/atomspace/blob/master/examples/atomspace/persistence.scm

and a demo of talking to multiple storage nodes at the same time:

https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-multi.scm

--linas

Adrian Borucki

unread,

Apr 17, 2021, 8:35:01 PM4/17/21

to opencog

On Saturday, 17 April 2021 at 20:10:23 UTC+2 linas wrote:

Hi Ben,

Thank you! I am seriously behind on my reading list; I have a pink piece of paper in front of me, right now, on which it is written "read Ben's latest stuff"! It's right up there with much more mundane chores.

Some comments; trying to be short, but failing...

On Sat, Apr 17, 2021 at 11:23 AM Ben Goertzel <b...@goertzel.org> wrote:

1) A distributed Atomspace (which will also, layered on SNet platform,
be the foundation a decentralized Atomspaces)

Shall I repeat again? The atomspace is already distributed, and there are now several different ways in which this works. Actually, N ways, to be precise. I'll describe them below. Again, I invite anyone and everyone to try it, use it, find out if or how it meets or doesn't meet current needs and expectations. (It's come a long ways since the iCog Labs experiments, and explicitly includes feedback from those experiments, Xabush and Kasim in particular; they may not think I am listening to them, but I am, and am taking their comments seriously.).

2) An "Atomese 2" language , consisting of a set of Atom types for use
in the Atomspace metagraph (with the flexibility to specify
sub-metagraphs practically usable as programs, among other
capabilities), plus some syntactic sugar for human use and a
corresponding interpreter

Well, I'm hoping that you don't succumb to an urge to rewrite everything from scratch. I fear that you are tempted to do this. I'm fairly certain I won't be able to talk you out of it, if that is what you choose to do. Let me only caution that the software industry has vast amounts of experience in making this decision, and in most situations, it turns out to be a dire mistake, and in most of the rest, it turns out to be incredibly costly, painful, expensive and many years behind schedule. I mention this because I know that you don't follow generic software development news, so you may think that I am exaggerating, or that it doesn't apply to you. You should talk to some neuro-typical software executives and get a second opinion before you embark on a rewrite.

You have seen some of our thinking on Distributed Atomspace, and as I
recall in our last discussion on this you said you had some critiques
in mind but didn't offer them in detail.

I started reading the document you sent me, but it was full of so many misconceptions and confusions, it was hard to know how to even start critiquing it. It was kind of a case of "throw it out and start all over again". (Oh, the irony) I was (still am) fearful my advice (and any document I write) would be ignored, so there's not much incentive to work hard on that. Life is short.

Perhaps the biggest critique is that a "uuid" is not a central concept, and that uuids are only needed on communications channels. They don't have to be "universally unique", they only need to be unique on that channel. The other critique is that one should never-ever underestimate the cost of communications: sending data from here to there is always expensive, and not just a little, but a lot.

You might not be aware of the AtomSpace RocksDB StorageNode. It is now 10x or 12x faster than the Postgres Node. This is not because postgres is "stupid", but because there is a huge cost associated with any storage medium that uses the network to communicate. Again: the cost of communications is huge. (well, OK, the atomspace postgres node is over-engineered; I think I now understand a slimmer and faster way of doing things. That's a rainy-day project. It rarely rains, here.)

----------------
So. The distributed atomspace -- things you may not be aware of. There is now a new Atom type: (StorageNode "some://example.com/url/where?ever") When you open it, you can send Atoms to that URL, or receive them, or even run pattern queries on the remote system, and get back only the matching results. You can have as many of these open as you want, and so can copy atoms around from one to another. You can close and reopen them at any time, for any reason. Everybody and their grandmother can now run an AtomSpace node, and merrily exchange Atoms and TruthValues with one-another.

This sounds more like *remote* AtomSpace, not distributed. This is still important for use cases where there is supposed to be a server handling multiple clients that send / receive knowledge from it.

The distributed use case would usually be when you have a compute cluster and you want to distribute data and tasks across its units - so for example you might have 100 machines there with a sharded AtomSpace and want to make a query such that all these shards get analysed and results pooled together in the end (a la MapReduce).

I do not know much about distributed computing myself but there would have to be some sort of orchestration process, ideally this would all be integrated with modern tools for cluster management like Kubernetes.

The need for such a workflow ultimately depends on how much computational power you need for your experiments / applications - I myself do not even think about running experiments on such a large scale (at least not yet)...

Linas Vepstas

unread,

Apr 17, 2021, 8:50:09 PM4/17/21

to opencog

On Sat, Apr 17, 2021 at 2:04 PM Ben Goertzel <b...@goertzel.org> wrote:

Hi,

> Shall I repeat again? The atomspace is already distributed, and there are now several different ways in which this works. Actually, N ways, to be precise. I'll describe them below. Again, I invite anyone and everyone to try it, use it, find out if or how it meets or doesn't meet current needs and expectations.

Cool, yes def. Vitaly & Alexey and co. (who are leading the Hyperon
effort) will check out the current state of the distributed Atomspace
code...

The current state is that "it works". It provides "mechanism" not "policy". For example, if you want to aggregate search results from 23 other nodes, it's as easy as writing a (simple) agent such that, whenever it gets a request, it sends it out to 23 other nodes, and collects up the results, and passes those on to the next guy. Each result comes with a time stamp, so you can decide (for example), to use cached results, if they are recent enough.

If I recall correctly, I designed it so that it will return any kind of executable result (not just pattern matches) so its really more of a "remote procedure call" -- you ask a remote atomspace to do something, and wait, or not, for the results. That is, you just ask for some EvaluationLink or ExecutationOutputLink, or whatever, to run on a remote node, and you get results back: Atoms or TV's. I forget the details; this got coded up last summer, as a side-effect of agi-bio work.

> Well, I'm hoping that you don't succumb to an urge to rewrite everything from scratch. I fear that you are tempted to do this. I'm fairly certain I won't be able to talk you out of it, if that is what you choose to do.

We are trying to approach this decision as rationally as we can, via
thinking pretty hard about requirements first so we can carefully
assess what in the current OC codebase is best rewritten from what
isn't

There is a Hyperon prototype now which is totally distinct code from
current OpenCog, but what relation this prototype ends up having w/
the actual Hyperon system is also currently unclear and up for (not
that far) future decision...

I spotted that. I think it would be a mistake to reinvent the atomspace. I doubt you could do better, and I suspect you might be underestimating the amount of work it would take. This is a classic, well-known issue in software development: confusing easy-to-understand ideas with the amount of time/effort to implement/debug them. Programmers always think "this is easy" and say to themselves "I can do it in an afternoon" and it always takes days if not weeks. They're not liars, nor are they self-deluded: simple concepts really do feel like it should be quick.

If the boss is asking for an estimate, the software devs are more careful. They look at a project and say "oh this should take 3-6 months". Managers take this estimate, and (secretly) multiply it by a factor of 2x to 4x. The managers report the estimates to the second line or executive, who secretly multiplies by another factor of two. The executive is usually the one with the best estimate. Again, don't take my word for it; ask a neurotypical software exec. (The green-horn execs get tripped up by this. You can google up "death march" and find hundreds of horror stories. Books are written on this kind of stuff.) You've been through this kind of stuff, certainly at iCogLabs, if not HK. It's not the employees. It's endemic to the entire industry.

This seems an important observation, and (among other points) seems
worth remembering in the context of the goal of making a
distributed-Atomspace-DB that has "Decentralized control and
ownership" in various senses

Again, I've just offered up mechanism, not policy. The mechanism ships atoms around, and enables remote-procedure calls. Deciding who owns what, what permissions are granted, etc. -- all that is "policy". The policy layer lies above, and makes those kinds of decisions. I'm just offering a fleet of semi-trailer trucks to move things around. What you do with them is at a different level.

> You might not be aware of the AtomSpace RocksDB StorageNode.

I'm aware of it and have seen the work on GitHub, but have never tried
it out... (whereas I did actually play w/ the postgres backing store
at one point...)

It is infinitely simpler. There's no setup! Zero setup!

There are a lot of fancier things we want for Hyperon, but I don't
currently have any strong opinion regarding whether building them on
top of your RocksDB back-end is a workable approach.

It's not my RocksDB, its facebook's. This one: https://rocksdb.org/

Efficient
distributed pattern-matching of an appropriately (but not overly)
broad class of static pattern-matching queries is the main thing.

This has been working since last summer - it got added for the agi-bio project.

It's not just "distributed pattern matching" it's "distributed execution of whatever".

-- static patterns, dynamic patterns, simple ones, complicated ones - things that

aren't patterns at all but are something else -- it doesn't care, it does them all.

Again: its mechanism, not policy.

The route to efficiency is subtle. Clearly, bad policy ruins efficiency (if your

truck freight dispatcher is telling empty trucks to drive around, then, yes,

you get terrible efficiency. That's bad policy, and you can't pin the blame

on the truck driver. Again - I'm providing the atomspace equivalent of trucks.

How you use them is... up to you.)

To continue the analogy: maybe the trucks are too slow. Making them faster

is called "performance tuning" and its ... well, most software devs hate to do it.

Don't know how to do it. It's an art usually left to specialists.

And then more analogies: perhaps you realize you needed boats, not trucks.

Well, that's a different matter. There are any number of software-dev stories

where half-way through the project, the executive realizes that boats don't work

on dry land. I kid you not. "Requirements gathering" and "writing code that

meets requirements" is what made the Dilbert comic strip funny. People

laughed, because it was a microcosm of real life.

The only way out that I know of is iterative prototypiing. You build one, and

use it, and if it works, great, and if it doesn't you make it better. The more

build/use iterations you can go through, the faster you learn.

(The biggest problem with the atomspace (or opencog in general) is it has

very few users. Which breaks the prototype/use iteration cycle.)

--linas

Linas Vepstas

unread,

Apr 17, 2021, 9:10:12 PM4/17/21

to opencog

Hi Adrian,

On Sat, Apr 17, 2021 at 7:35 PM Adrian Borucki <gent...@gmail.com> wrote:

This sounds more like *remote* AtomSpace, not distributed. This is still important for use cases where there is supposed to be a server handling multiple clients that send / receive knowledge from it.
The distributed use case would usually be when you have a compute cluster and you want to distribute data and tasks across its units - so for example you might have 100 machines there with a sharded AtomSpace and want to make a query such that all these shards get analysed and results pooled together in the end (a la MapReduce).
I do not know much about distributed computing myself but there would have to be some sort of orchestration process, ideally this would all be integrated with modern tools for cluster management like Kubernetes.

Per other email, I am providing "mechanism" not "policy". The analogy-- I'm providing a fleet of trucks. What you do with them is up to you. Drive them between two cities - that's "client-server". You want to distribute data across a compute cluster? Sure. You can do that too. ("spoke-wheel", "warehouse distribution center") Want to shard and map-reduce the results? Yeah. no problem. ("server farm", "scatter gather") Whatever.

I started a git repo for holding policies of this kind. I announced it last summer on this mailing list: https://github.com/opencog/atomspace-agents -- it got a big collective yawn. Literally, no one cared at all. So I instantly stopped work on it.

We have a kind-of dysfunctional mode of operation in opencog-land. Everyone is very eager to provide requirements, to suggest ideas. When those requirements, ideas show up in code, well, it turns out that no one was actually interested in using them. Discussions are great, sure; I'd like to see more discussion. But there also has to be an implicit agreement, understanding: if what you wanted showed up tomorrow, would you still want it? Long experience shows the answer is usually "no". People ask for stuff, and then come back and say "never mind".

(This is one of the primary pitfalls of "requirements gathering" in software.)

I've been struggling to break out of that mode of operation. I can't say I've been successful.

-- Linas

Ben Goertzel

unread,

Apr 18, 2021, 2:34:23 AM4/18/21

to opencog

Linas,

About DAS (Distributed Atomspace), this file

https://wiki.opencog.org/wikihome/images/1/16/Distributed_Atom_Space_-_Requirements.pdf

reviews four potential use-cases in moderate detail.

It would be very interesting to understand what you think is the
shortest path to fulfilling the requirements outlined for these
use-cases, via creating policies that leverage the mechanisms you've
created using RocksDB. I am being very literal here, I would love to
understand what your thinking is in this regard.

I understand there are some aspects of that document, like the
assumption of UUIDs, that you don't like, but I think the outline of
the use cases and the queries involved in them is basically
independent of any of these "questionable" assumptions.

I would also be interested to understand what if any are the key
differences between your approach and Figure 6 of

https://wiki.opencog.org/wikihome/images/f/fd/Distributed_Atom_Space_-_Architecture.pdf

It seems to me that the role played by the "Atom DB" in that Figure,
which is a key-value store, could potentially be fulfilled by your
RocksDB backing store. But maybe I'm missing something.

Adam wrote:
>> The distributed use case would usually be when you have a compute cluster and you want to distribute data and tasks across its units - so for example you might have 100 machines there with a sharded AtomSpace and want to make a query such that all these shards get analysed and results pooled together in the end (a la MapReduce).
>> I do not know much about distributed computing myself but there would have to be some sort of orchestration process, ideally this would all be integrated with modern tools for cluster management like Kubernetes.

So in the distributed Atomspace architecture high level sketch linked
above, the sharding of the distributed Atomspace is packed into the
internals of the key-value store. If you want to have a
distributed AI process that acts separately on each shard, that is
do-able but subtle-ish. Suppose we have N machines, each of which
has on it

-- an AI process (e.g. say some PLN logical reasoning)

-- a local Atomspace (in RAM)

-- a piece of the distributed persistent Atomspace, say in RocksDB

Then of course the AI process on machine X can query the local
Atomspace specifically as it wishes. If it wants to query the
persistent backing store, it can query RocksDB and it will get faster
response if the answer happens to be stored in the fragment of RocksDB
that is living on the same machine as it. There will be some bias
toward the portion of the distributed RocksDB on machine X having
Atoms that relate to the Atoms in the local Atomspace on machine X ...
but this depends on the inner workings of RocksDB. Or at least
that's how I'd think it would work, this is speculative...

Management of all these Atomspaces, AI processes and RocksDB portions
indeed should likely be done using Kubernetes and other commonplace
scalable tools...

> We have a kind-of dysfunctional mode of operation in opencog-land. Everyone is very eager to provide requirements, to suggest ideas. When those requirements, ideas show up in code, well, it turns out that no one was actually interested in using them. Discussions are great, sure; I'd like to see more discussion. But there also has to be an implicit agreement, understanding: if what you wanted showed up tomorrow, would you still want it? Long experience shows the answer is usually "no". People ask for stuff, and then come back and say "never mind".
>
> (This is one of the primary pitfalls of "requirements gathering" in software.)

We (meaning SingularityNET) have a decently-funded commercial project
that uses the current Atomspace/OpenCog (Awakening Health, a JV with
Hanson Robotics). We are also doing, as you know, genomics research
using bio-Atomspace and PLN. And exploratory R&D using both Hyperon
prototype and current Atomspace to control agents playing Minecraft
(still early stage). So OpenCog improvements that are genuinely
useful in these applications... will be used...

ben

adeel...@gmail.com

unread,

Apr 18, 2021, 10:55:27 AM4/18/21

to opencog

This is so cool!

I've been working on two projects. And our main focus is to rollout a global healthcare policy.

Co-incidentally the name that we thought of is also 'Awake Universal Healthcare'. What a co-incidence!

We do not have a website as of yet. But if anyone happens to be interested and if I may share. Particularly Ben and David Hanson!

I really hope that sharing these links doesn't get me banned from here!

Our mission statement and milestones outlined: https://doc.clickup.com/d/h/85qk9-315/b33ceffd6ed0a62
We made most of our work (About 98% public). https://share.clickup.com/l/h/4-3461412-1/88677a3e84aec47
- The only thing we do not share openly is the core AeroSpace related work (Sister company). And obviously our customer records, employee reviews e.t.c. We do not share the designs because of security related considerations and also being mindful of ITAR related policies. Even though, 100% of the work that we leverage is publicly accessible (Books, tutorials, e.t.c).

Would love to team up with Goertzel and Hanson. My parents are in their 70's and 80's and it would be amazing to have some assistance. My dad just had a knee replacement. Plus, optimizing cognitive functions throughout the course of the lifetime and not just into old-age.

Overall, this is a great area that guys have decided to focus on! I have already followed Awaken Health on most of the platforms.

Also, here is a link to my Linkedin. If anyone reading this would like to connect. https://www.linkedin.com/in/adeelkhan1/

Cheers! And thank you for reading!

Adeel

Angel Arturo Ramirez Suárez

unread,

Apr 21, 2021, 11:23:24 AM4/21/21

to opencog

So is Hyperion a fork of OpenCog or which one is the version being developed?

Ben Goertzel

unread,

Apr 21, 2021, 12:44:06 PM4/21/21

to opencog

Both the Original OpenCog (as a placeholder term for the moment) and
OpenCog Hyperon, are active at the moment

Linas is doing a lot of great stuff on Original OpenCog, and our
SingularityNET/ Singularity Studio / Awakening.Health / TrueAGI team
is using this system as part of the control system for the
Awakening.Health eldercare robots/avatars (and making tweaks/fixes as
needed along the way). This application code is mostly not openly
available at the moment, but there is a plan to release an OSS version
via the SophiaDAO project (which will have the key elements of the
Awakening.Health neural-symbolic dialogue / social-robotics
architecture, but without the medical-domain-specific stuff that will
remain proprietary...)

However within our SingularityNET group, we are putting our "new AGI
R&D" effort into designing and building Hyperon. Hyperon will re-use
some aspects of Original OpenCog, and will also replace some
significant parts with new, different code embodying new
methodologies, and we are still working out many particulars. There
is a Hyperon prototype but it's being used mainly as a playground for
concretizing design ideas, and the relation of this prototype to the
ultimate Hyperon codebase is also not yet clear...

We aim to have more clarity on various aspects of Hyperon design by
end of June so we can start serious Hyperon engineering over the
summer.... We want to move as fast as we can once this gets started,
but have also wanted not to start intensive coding prematurely because
we wanted to work through various hard conceptual/formal issues to a
greater extent first... (but that has now largely been done, though
there will still be a bunch of activity on this front in May/June...)

-- Ben

> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/90c029c3-164b-4cc2-9525-71ea4631da60n%40googlegroups.com.

Linas Vepstas

unread,

Apr 22, 2021, 5:41:34 PM4/22/21

to opencog

On Sun, Apr 18, 2021 at 1:34 AM Ben Goertzel <b...@goertzel.org> wrote:

Linas,

About DAS (Distributed Atomspace), this file

https://wiki.opencog.org/wikihome/images/1/16/Distributed_Atom_Space_-_Requirements.pdf

reviews four potential use-cases in moderate detail.

It would be very interesting to understand what you think is the

It takes real effort to provide reasonable and accurate revisions and commentary. Perhaps I could take you up on your earlier offer? In the meanwhile, some quick remarks:

shortest path to fulfilling the requirements outlined for these
use-cases,

Please be careful about putting the cart before the horse. In what is called "the spiral model of development", one builds a prototype, examines the issues, and then addresses those as appropriate. My somewhat ill-tempered response to distributed atomspaces is that it is a solution looking for a problem.

If we had an algorithm that ran on the AtomSpace and it failed to fit into RAM, then one can examine strategies for making the problem smaller, or for using larger (and slower) SSD to hold those things that you don't need right away. (This is not a DAS)

If there is an algorithm that runs too slowly, and would run faster in parallel, then you can build a "compute farm": distribute the data to a bunch of workers, and aggregate the results (this is also not DAS)

If... well I can imagine that there are scenarios where having a DAS is useful .. but we don't have working code for any of those. The problem of building a technology like DAS for some non-existent users who might show up in the future... well, you might discover that the DAS is built wrong. It might have the wrong performance profile. It might be too clunky. It might have lots of features you don't need, and be missing features you do. Designing and building things that don't have a current use is a very risky business.

This is why I encourage everyone to use what we've got: use it, come back with an actual use-case that can be benchmarked and debugged.

I would also be interested to understand what if any are the key
differences between your approach and Figure 6 of

Without spending a lot of time and energy examining that document, I can't say. That figure seems overly complexticated, to me. I know that the name "Elon Musk" is very polarizing, but I do like one of his quotes: "The best part is no part. ... Undesigning is the best thing. Delete it." He's absolutely right. He's not even the first one to say that. Einstein beat him by a century: "Make it as simple as possible, but no simpler". My knee-jerk reaction is that Figure 6 is violating these maxims. Without spending a lot of time and effort to comprehend the intent of that diagram, I can't really tell. (People have remarked that Musk's latest rocket engines are the most complex engines ever built.)

So in the distributed Atomspace architecture high level sketch linked
above, the sharding of the distributed Atomspace is packed into the
internals of the key-value store. If you want to have a
distributed AI process that acts separately on each shard, that is
do-able but subtle-ish. Suppose we have N machines, each of which
has on it

-- an AI process (e.g. say some PLN logical reasoning)

-- a local Atomspace (in RAM)

-- a piece of the distributed persistent Atomspace, say in RocksDB

Then of course the AI process on machine X can query the local
Atomspace specifically as it wishes. If it wants to query the
persistent backing store, it can query RocksDB and it will get faster
response if the answer happens to be stored in the fragment of RocksDB
that is living on the same machine as it. There will be some bias
toward the portion of the distributed RocksDB on machine X having
Atoms that relate to the Atoms in the local Atomspace on machine X ...
but this depends on the inner workings of RocksDB. Or at least
that's how I'd think it would work, this is speculative...

This is not quite how it works, but, roughly speaking, I've got prototypes that do this today. At this very basic level, its just not that hard.

We (meaning SingularityNET) have a decently-funded commercial project
that uses the current Atomspace/OpenCog (Awakening Health, a JV with
Hanson Robotics). We are also doing, as you know, genomics research
using bio-Atomspace and PLN. And exploratory R&D using both Hyperon
prototype and current Atomspace to control agents playing Minecraft
(still early stage). So OpenCog improvements that are genuinely
useful in these applications... will be used...

Please be careful with benchmarking. The Christmas-before-last, Mike Duncan came to me and asked that I look into making AGI-Bio run faster. There was kind of this sense that the atomspace was too slow, and needed tuning. I decided to give him this Christmas present ... By restructuring the agi-bio code, I got it to run 20x faster. That's correct! Twenty times faster! This did not need any changes to the atomspace! Thus, blaming the atomspace for bad performance was inappropriate. Don't fall into the same trap with "agents playing Minecraft"! Careful making assumptions.

Another thing that was careful analysis showed that the agi-bio code was performing certain types of pattern matcher queries that could benefit from caching. That caching mechanism *was* added to the atomspace. So this provides a hands-on, real-world example of where having an actual, running, testable, probable, measureable block of software allows performance tuning and design changes. Without it, no one would ever have guessed that this particular optimization was possible or interesting or useful. Having something measurable is key. (the caching did bump performance by another factor of 30%)

BTW, the fixes to agi-bio were not one commit that made things run 20x faster. It was 10 or 15 commits, each of which added 5%, 10%, or 30% each. Just like compounding interest, these multiplied up. A change that improves something 10% may not feel like a lot, but it does reveal where the next bottleneck is.

-- Linas

Ben Goertzel

unread,

Apr 22, 2021, 6:33:12 PM4/22/21

to opencog

> If... well I can imagine that there are scenarios where having a DAS is useful .. but we don't have working code for any of those. The problem of building a technology like DAS for some non-existent users who might show up in the future... well, you might discover that the DAS is built wrong. It might have the wrong performance profile. It might be too clunky. It might have lots of features you don't need, and be missing features you do. Designing and building things that don't have a current use is a very risky business.

The bio-Atomspace we are experimenting with now contains only a small
% of the biomedical knowledge we would like it to, which is because of
RAM and processing speed limitations in current OpenCog

Recent optimizations help but don't remotely come close to solving the problem

The neural-symbolic grammar learning that Andres Suarez and I
prototyped last spring, also couldn't viably be done using OpenCog for
similar reasons (RAM and processing speed limitations). If we could
complete that work it would be very useful in our current humanoid
robotics work w/ OpenCog (Awakening.Health)

The experimentation on pattern mining from inference histories for
automated inference control, that Nil was doing a year ago, was
incredibly slow also due to Atomspace limitations. (Exploring this
sort of method is part of our motivation for the Minecraft
experimentation.)

Ditto Shujing's work eons ago on pattern mining of agent behavior data
from a game world

It is possibly true that for each such case, one could design a
specialized architecture to support just that case, working around the
need for a general-purpose DAS in that particular case....

Shujing did in fact build her own specialized quasi-DAS just for her
sort of pattern-mining applications, though it's not deprecated...

I understand we could proceed by writing
fully-working-except-for-scalability-issues code for all the above
applications I've alluded to (and more), and then analyzing all this
code and its specific scalability issues, and using this analysis to
drive design of an improved system...

Instead we are indeed aiming to proceed in a faster but in some senses
more risky way, by creating a design that appears to us capable of
scalably carrying out applications such as the above (and such as the
specific use-cases in the DAS document referenced...).

> Without spending a lot of time and energy examining that document, I can't say. That figure seems overly complexticated, to me. I know that the name "Elon Musk" is very polarizing, but I do like one of his quotes: "The best part is no part. ... Undesigning is the best thing. Delete it." He's absolutely right. He's not even the first one to say that. Einstein beat him by a century: "Make it as simple as possible, but no simpler". My knee-jerk reaction is that Figure 6 is violating these maxims. Without spending a lot of time and effort to comprehend the intent of that diagram, I can't really tell. (People have remarked that Musk's latest rocket engines are the most complex engines ever built.)

With all due respect (and I really do respect your technical intuition
greatly even though I don't always agree w/ it), I think Alexey and
Vitaly and Cassio and I are already well aware of Occam's Razor in its
various forms.... As you point out the practical application of this
sort of maxim is a somewhat subtler matter than the generalized
articulation ;p

>> -- a piece of the distributed persistent Atomspace, say in RocksDB
>>
>> Then of course the AI process on machine X can query the local
>> Atomspace specifically as it wishes. If it wants to query the
>> persistent backing store, it can query RocksDB and it will get faster
>> response if the answer happens to be stored in the fragment of RocksDB
>> that is living on the same machine as it. There will be some bias
>> toward the portion of the distributed RocksDB on machine X having
>> Atoms that relate to the Atoms in the local Atomspace on machine X ...
>> but this depends on the inner workings of RocksDB. Or at least
>> that's how I'd think it would work, this is speculative...
>
>
> This is not quite how it works, but, roughly speaking, I've got prototypes that do this today.

What are the main differences btw what I described above and what your
prototypes do?

> Please be careful with benchmarking. The Christmas-before-last, Mike Duncan came to me and asked that I look into making AGI-Bio run faster. There was kind of this sense that the atomspace was too slow, and needed tuning. I decided to give him this Christmas present ... By restructuring the agi-bio code, I got it to run 20x faster. That's correct! Twenty times faster! This did not need any changes to the atomspace! Thus, blaming the atomspace for bad performance was inappropriate. Don't fall into the same trap with "agents playing Minecraft"! Careful making assumptions.

Your help on the bio-Ai project is much appreciated! However, I would
note that Mike Duncan doesn't have the facility w/ tuning and fixing
Atomspace/OpenCog code that Vitaly and some of his St. Petersburg
colleagues do.... I would venture there are less likely to be
relatively quick fixes to apparent brick walls that Vitaly etc. run up
against. But I'd be happy to be refuted by reality on this -- we will
be using Original OpenCog in Awakening.Health for quite some time and
so having it work better and better is definitely valuable to us...

> Another thing that was careful analysis showed that the agi-bio code was performing certain types of pattern matcher queries that could benefit from caching. That caching mechanism *was* added to the atomspace. So this provides a hands-on, real-world example of where having an actual, running, testable, probable, measureable block of software allows performance tuning and design changes. Without it, no one would ever have guessed that this particular optimization was possible or interesting or useful. Having something measurable is key. (the caching did bump performance by another factor of 30%)

Hmm, at a high level we did guess a pattern cache was going to be
useful -- and Senna implemented one some time ago. However, I have
not compared his implementation/design/concept with yours, and it
would not shock me if your instantiation of the broad concept was more
effective, given your deep familiarity w/ all the code and systems
involved ;)

ben

Linas Vepstas

unread,

Apr 23, 2021, 12:15:45 AM4/23/21

to opencog

Hi Ben,

On Thu, Apr 22, 2021 at 5:33 PM Ben Goertzel <b...@goertzel.org> wrote:

> If... well I can imagine that there are scenarios where having a DAS is useful .. but we don't have working code for any of those. The problem of building a technology like DAS for some non-existent users who might show up in the future... well, you might discover that the DAS is built wrong. It might have the wrong performance profile. It might be too clunky. It might have lots of features you don't need, and be missing features you do. Designing and building things that don't have a current use is a very risky business.

The bio-Atomspace we are experimenting with now contains only a small
% of the biomedical knowledge we would like it to, which is because of
RAM and processing speed limitations in current OpenCog

Recent optimizations help but don't remotely come close to solving the problem

OK. Well, that's news to me. I try to keep everyone happy, and when there aren't any comments or complaints, I assume everyone is happy. Do you have actual examples, where you are running out of RAM, and where things are going too slow? Or is this just a gut-feel issue, for which you have no actual data?

I cannot repeat this often enough or strongly enough: the kinds of optimizations that are performed on software systems are extremely data-dependent and algorithm dependent. It is effectively impossible to perform optimizations without having a specific use case. This is a kind-of theorem of computer science.

The neural-symbolic grammar learning that Andres Suarez and I
prototyped last spring, also couldn't viably be done using OpenCog for
similar reasons (RAM and processing speed limitations).

No one ever complained about RAM or processing speeds, so it's kind of unfair to just bring this up a year later. I had the impression that the theory you were developing wasn't working out; I wasn't surprised, but I never fully understood it.

This spring, I restarted work on https://github.com/opencog/learn -- you can review the README for the current status. I get good results. Its a big project. Things go slowly. Not enough time in the day.

If we could
complete that work it would be very useful in our current humanoid
robotics work w/ OpenCog (Awakening.Health)

I would like to help, but I suspect that the direction I've been going in is not likely to match your requirements. A good solution might not arrive on the time-scale you want.

The experimentation on pattern mining from inference histories for
automated inference control, that Nil was doing a year ago, was
incredibly slow also due to Atomspace limitations.

Ben, that is also incredibly unfair. Never-ever did you or Nil or anyone else ever complain about "atomspace limitations". So you can't just start blaming it now. If there is an actual performance problem, open a github issue, and describe it. Provide instrumentation, bottlenecks.

I watched those projects from afar, and ... well, all I can say is "that's not how I would have done it". The fact that you had performance problems is almost surely a statement about your algorithms, and not a statement about the atomspace. The atomspace is what it is, and if you use it incorrectly, you'll get disappointing results. It's not a magic wand. It's just software, like any other kind of software.

It is possibly true that for each such case, one could design a
specialized architecture to support just that case, working around the
need for a general-purpose DAS in that particular case....

You are describing things that sound like (to me) inadequate or inappropriate algorithms, and then switching the topic to DAS. You don't have to use the atomspace -- you could have done the inference mining on any one of a half-dozen map-reduce platforms out there -- many of them from the Apache.org people -- and you would not have gotten performance that is any better than what the atomspace provides.

These complaints are reminiscent of decades worth paper magazine articles (remember those?), blog-posts and marketing campaigns about big data, scale-up vs scale-out, data mining, machine learning, no-sql vs sql vs graph databases. There must be ten thousand white-papers and a hundred thousand pages on this stuff. Nothing that I saw Shujing doing with pattern mining was any different than what anyone else in the industry does when they data-mine. This is common, every-day stuff. Everyone in the industry experiences variations thereon. If it didn't work out for you, maybe you didn't have your nose to the grind-stone.

The advantage that google enjoys (besides having more money) is that they can pair together PhD's who understand the problem, with engineers who understand how to make the code run fast. Either way, those salaries are huge, and they never have enough grunts to work on the visionary projects, and serendipity plays a big role, as well as good management. You can get lucky some of the time, you can't get lucky all of the time.

I understand we could proceed by writing
fully-working-except-for-scalability-issues code for all the above
applications I've alluded to (and more), and then analyzing all this
code and its specific scalability issues, and using this analysis to
drive design of an improved system...

I don't think that is what i'm saying. That's not really how it's done, when it's done in the industry. You could try that, but it's likely to fail. Why? For starters, I'm not convinced that the algorithms are correct. If you have a combinatoric explosion (which I am guessing is what is happening) then you have to address that first: you have several options: (a) you can try to mitigate it (b) you can hunt for alternative algorithms and data structures (c) redesign it for specialty hardware (e.g. GPU's) (d) ...

No one ever builds "working except for scalability" code, and then "just scales it". That is not how it's done.

Instead we are indeed aiming to proceed in a faster but in some senses
more risky way, by creating a design that appears to us capable of
scalably carrying out applications such as the above

Given that I don't understand the "applications such as those above", I don't know how to respond. You would have to describe those applications in engineering terms, in order to understand how they could be implemented so as to run efficiently and scalably ... without an actual description of what it is, it's not a solvable problem. There's just an insufficient amount of detail.

>> -- a piece of the distributed persistent Atomspace, say in RocksDB
>>
>> Then of course the AI process on machine X can query the local
>> Atomspace specifically as it wishes. If it wants to query the
>> persistent backing store, it can query RocksDB and it will get faster
>> response if the answer happens to be stored in the fragment of RocksDB
>> that is living on the same machine as it. There will be some bias
>> toward the portion of the distributed RocksDB on machine X having
>> Atoms that relate to the Atoms in the local Atomspace on machine X ...
>> but this depends on the inner workings of RocksDB. Or at least
>> that's how I'd think it would work, this is speculative...
>
>
> This is not quite how it works, but, roughly speaking, I've got prototypes that do this today.

What are the main differences btw what I described above and what your
prototypes do?

Rocks does not do sharding across the network.

If you have different fragments of an atomspace dataset on 10 different networked machines, and you want to write a pattern match that will run across all of those machines in parallel, and join together the results, I could write that snippet of code in the proverbial afternoon. It's so simple, in fact, that it could be written as an example, to add to the set of examples. (actually, I think one of the examples already does this, more or less. Actually, its a mashup of these two demos: https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-multi.scm and https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-query.scm )

That's not the hard part. The hard part is a ladder of requirements:

* How do you get the shards of data onto those machines? Do you use rsync to copy files, or do you want to send them via atomspace? If you use rsync, then where will you keep the script for it?

* Where do you keep the list of the currently-active set of 10 machines? Do you need a GUI for that? A phone app?

* What do you do if one or more of them hasn't booted, or has crashed?

* Are they password-protected? The atomspace is not password-protected!

There are atomspace issues:

* The simplest solution is to wait until all ten have returned results, and then join them together.

* Another possibility is to let the results dribble in, and join them as they arrive. This is more complex, and requires more sophistication. The 10-line demo program now becomes a 100 or 200 line program.

* What if one of the machines has crashed during processing? e.g. bad network card, failed disk, power outage?

* Perhaps you want to load-balance, so that the slowest machine is not always the bottleneck. This requires measuring each machine to see if it is idle or not, and giving it more work if it is idle. This is non-trivial. Most engineers would do this outside of the atomspace, but you could also do it inside the atomspace if you write custom Atoms for it. Does your design require custom Atoms for load-balancing?

* Perhaps the dataset is badly sharded, so that one of the machines is always a bottleneck. This requires not only finding the busiest machine, but then re-sharding the data. Many databases do this automatically. The conventional way in which this is done is to find a sequence of "least cruel cuts" in the Tononi sense, and move those to other machines. Find the cuts that hurt Phi the least. Talking about phi is fancy-pants buzzword-slinging, but all the people who do data-mining have a very intuitive understanding of Tononi's Phi, and have had that understanding many, many decades ago, because it's key to both software and hardware optimization. This is easy to say, but finding those cuts is hard to do. Nothing in opencog today does this automatically. However, I can imagine several possible solutions, ranging from real easy ones to really complex ones, each having pros and cons. Vendors like Oracle have had solutions for this, for decades. They've invested hundreds of man-years into it.

* There's more. I wanted to mention concepts like "explain vacuum analyze" and "query planning" but perhaps some other day. Everyone gets to solve the query planning problem, including Hyperon. There's no free lunch.

Then there are the data-design issues and meta-issues

* Perhaps you are storing data as atoms, that should have been Values. Values are a lot faster than Atoms, but they get this performance with a set of function trade-offs.

* Perhaps your data should not be kept in the atomspace at all. This includes audio, video live-streams, text files, medical records, and a zillion other data types.

* Perhaps you want to run SciPy on text summarizers, or write tensorflow algos. There are 1001 software platforms that are tuned for stuff like that. Use the tool that is appropriate.

Your help on the bio-Ai project is much appreciated! However, I would
note that Mike Duncan doesn't have the facility w/ tuning and fixing
Atomspace/OpenCog code

Mike is the one who asked for help. I worked with him as the manager in charge of the project, and not as the technical contact. 90% of the code was written by Habush, and he is the one I interacted with regularly, and a little bit with Hedra Seid. I like Habush. He enjoys exploring brand new shiny things.

that Vitaly and some of his St. Petersburg
colleagues do....

Again, this code is not a part of the atomspace, its not even a part of opencog. Vitaly & friends have never looked at it, never touched it.

I would venture there are less likely to be
relatively quick fixes to apparent brick walls that Vitaly etc. run up
against. But I'd be happy to be refuted by reality on this -- we will
be using Original OpenCog in Awakening.Health for quite some time and
so having it work better and better is definitely valuable to us...

None of the agi-bio code is a part of opencog. That is not the code that got fixed!

Hmm, at a high level we did guess a pattern cache was going to be
useful -- and Senna implemented one some time ago.

The concept of a cache is about as generic as the concept of a loop or an if statement. You are effectively saying that Senna thought of using if-statements and loops, and implemented a program that used them. That's just crazy-making!

It's a shame I have to end the email like this, but this is the kind of mis-communication that we habitually engage in. I find it difficult, at times. The alternative is to ignore these kinds of comments, but that's not terribly productive, either.

-- Linas

Ben Goertzel

unread,

Apr 23, 2021, 12:55:55 AM4/23/21

to opencog

>> The bio-Atomspace we are experimenting with now contains only a small
>> % of the biomedical knowledge we would like it to, which is because of
>> RAM and processing speed limitations in current OpenCog
>>
>> Recent optimizations help but don't remotely come close to solving the problem
>
>
> OK. Well, that's news to me. I try to keep everyone happy, and when there aren't any comments or complaints, I assume everyone is happy. Do you have actual examples, where you are running out of RAM, and where things are going too slow? Or is this just a gut-feel issue, for which you have no actual data?

Yes, Hedra (who is working with PLN on bio-Atomspace) is hitting these
issues all the time, and because of this she limits the amount of data
imported into Atomspace and the scope of queries run against Atomspace
(e.g. filtering a query to focus on just a few genes rather than all
the genes of interest, etc.)

Nil is aware of this work and understands it in greater detail than
idea (from an OpenCog usage view not a biology view) and if you're
curious to dig in, asking him is probably the best idea...

> I cannot repeat this often enough or strongly enough: the kinds of optimizations that are performed on software systems are extremely data-dependent and algorithm dependent. It is effectively impossible to perform optimizations without having a specific use case. This is a kind-of theorem of computer science.

This is way overstated IMO. For instance optimizations made to allow
fast matrix operations on GPUs for computer graphics, turned out to be
useful for all sorts of NN and other AI algorithms. Of course things
can be further optimized for specific NNs or other AI algos, but
nonetheless the more generic optimizations made w/ computer graphics
in mind were pretty helpful for optimizing AI ...

Theorems like "no free lunch" etc. operate at a level of abstraction
and extremity that doesn't really help in practical cases, I feel...

>> The neural-symbolic grammar learning that Andres Suarez and I
>> prototyped last spring, also couldn't viably be done using OpenCog for
>> similar reasons (RAM and processing speed limitations).
>
>
> No one ever complained about RAM or processing speeds, so it's kind of unfair to just bring this up a year later. I had the impression that the theory you were developing wasn't working out; I wasn't surprised, but I never fully understood it.

The theory IMO is highly promising, and we paused that work because of
other priorities not any problems w/ the ideas nor any lack of quality
in the prototype results. However to pursue that work using
Atomspace in a straightforward way would require importing way more
data into a single Atomspace than can be done in RAM on a single
current-day machine.

> This spring, I restarted work on https://github.com/opencog/learn -- you can review the README for the current status. I get good results. Its a big project. Things go slowly. Not enough time in the day.

Cool, I will take a look...

>> The experimentation on pattern mining from inference histories for
>> automated inference control, that Nil was doing a year ago, was
>> incredibly slow also due to Atomspace limitations.
>
>
> Ben, that is also incredibly unfair. Never-ever did you or Nil or anyone else ever complain about "atomspace limitations". So you can't just start blaming it now. If there is an actual performance problem, open a github issue, and describe it. Provide instrumentation, bottlenecks.

Some problems are too obvious and too severe for it to make sense to
take this sort of approach. Problems that clearly can't be fixed by
incremental improvements.

> I watched those projects from afar, and ... well, all I can say is "that's not how I would have done it". The fact that you had performance problems is almost surely a statement about your algorithms, and not a statement about the atomspace. The atomspace is what it is, and if you use it incorrectly, you'll get disappointing results. It's not a magic wand. It's just software, like any other kind of software.

This could have been said about all neural net AI algorithms in the
period before we had modern GPUs and their associated software tools.
But in fact many algorithms that were run in the 1980s and 1990s with
poor results, were run a couple decades later on more modern hardware
and associated software frameworks, with really exciting results.
Without any changes to the core algorithms -- though often some
parameter tweaks and straightforward network architecture improvements
(of the sort that are straightforward once you're able to iterate
quickly running experiments at the appropriate scale).

So the history of AI contains a lot of cases that contradict the sort
of assertion you're making. Often algorithms that worked poorly at
one scale, ended up working great at a more appropriate scale (where
"scale" means amount of data and also amount of processor and RAM,
appropriately deployed...)

>> It is possibly true that for each such case, one could design a
>> specialized architecture to support just that case, working around the
>> need for a general-purpose DAS in that particular case....
>
>
> You are describing things that sound like (to me) inadequate or inappropriate algorithms, and then switching the topic to DAS. You don't have to use the atomspace -- you could have done the inference mining on any one of a half-dozen map-reduce platforms out there -- many of them from the Apache.org people -- and you would not have gotten performance that is any better than what the atomspace provides.

That is not really true... the mining part could be done way faster
than is possible using current OpenCog tools. However these other
tools don't have the flexibility to do the inference part in any
non-convoluted way. And if we're going to set things up w/ closely
coupled back-and-forth between pattern mining of inference patterns,
and inference itself, it's nice if the two aspects are not implemented
in totally separate systems with a slow communication channel btw
them...

>Nothing that I saw Shujing doing with pattern mining was any different than what anyone else in the industry does when they data-mine.

Standard datamining algorithms do not look for surprising patterns in
hypergraphs.

> Given that I don't understand the "applications such as those above", I don't know how to respond. You would have to describe those applications in engineering terms, in order to understand how they could be implemented so as to run efficiently and scalably ... without an actual description of what it is, it's not a solvable problem. There's just an insufficient amount of detail.

Yeah of course my email did not contain full detail about these
applications, that would be infeasible to give in such a short space
and time.

>> What are the main differences btw what I described above and what your
>> prototypes do?
>
>
> Rocks does not do sharding across the network.
>
> If you have different fragments of an atomspace dataset on 10 different networked machines, and you want to write a pattern match that will run across all of those machines in parallel, and join together the results, I could write that snippet of code in the proverbial afternoon. It's so simple, in fact, that it could be written as an example, to add to the set of examples. (actually, I think one of the examples already does this, more or less. Actually, its a mashup of these two demos: https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-multi.scm and https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-query.scm )
>
> That's not the hard part.

yes, agreed. the above is understood. We were doing something
similar in HK some years ago with Mandeep's gearman based
implementation, but clearly your current system is better in various
ways.

The hard part is a ladder of requirements:
> * How do you get the shards of data onto those machines? Do you use rsync to copy files, or do you want to send them via atomspace? If you use rsync, then where will you keep the script for it?
> * Where do you keep the list of the currently-active set of 10 machines? Do you need a GUI for that? A phone app?
> * What do you do if one or more of them hasn't booted, or has crashed?
> * Are they password-protected? The atomspace is not password-protected!
>
> There are atomspace issues:
> * The simplest solution is to wait until all ten have returned results, and then join them together.
> * Another possibility is to let the results dribble in, and join them as they arrive. This is more complex, and requires more sophistication. The 10-line demo program now becomes a 100 or 200 line program.
> * What if one of the machines has crashed during processing? e.g. bad network card, failed disk, power outage?
> * Perhaps you want to load-balance, so that the slowest machine is not always the bottleneck. This requires measuring each machine to see if it is idle or not, and giving it more work if it is idle. This is non-trivial. Most engineers would do this outside of the atomspace, but you could also do it inside the atomspace if you write custom Atoms for it. Does your design require custom Atoms for load-balancing?
> * Perhaps the dataset is badly sharded, so that one of the machines is always a bottleneck. This requires not only finding the busiest machine, but then re-sharding the data. Many databases do this automatically. The conventional way in which this is done is to find a sequence of "least cruel cuts" in the Tononi sense, and move those to other machines. Find the cuts that hurt Phi the least. Talking about phi is fancy-pants buzzword-slinging, but all the people who do data-mining have a very intuitive understanding of Tononi's Phi, and have had that understanding many, many decades ago, because it's key to both software and hardware optimization. This is easy to say, but finding those cuts is hard to do. Nothing in opencog today does this automatically. However, I can imagine several possible solutions, ranging from real easy ones to really complex ones, each having pros and cons. Vendors like Oracle have had solutions for this, for decades. They've invested hundreds of man-years into it.
> * There's more. I wanted to mention concepts like "explain vacuum analyze" and "query planning" but perhaps some other day. Everyone gets to solve the query planning problem, including Hyperon. There's no free lunch.
>
> Then there are the data-design issues and meta-issues
> * Perhaps you are storing data as atoms, that should have been Values. Values are a lot faster than Atoms, but they get this performance with a set of function trade-offs.
> * Perhaps your data should not be kept in the atomspace at all. This includes audio, video live-streams, text files, medical records, and a zillion other data types.

Yes, this are all among issues that need to be solved in order to make
an effective DAS (a goal which you don't have a great affinity for, I
understand). The fact that this long list of issues (which is not
complete ofc) still remains to be addressed, means to me that your
RocksDB based prototype is not actually very close to what we would
need for a DAS.

But could one usefully build a DAS on top of your current RocksDB
code? Surely one could but it's not yet clear to me that's the
optimal approach... maybe it is...

>> Hmm, at a high level we did guess a pattern cache was going to be
>> useful -- and Senna implemented one some time ago.
>
>
> The concept of a cache is about as generic as the concept of a loop or an if statement. You are effectively saying that Senna thought of using if-statements and loops, and implemented a program that used them. That's just crazy-making!

What I mean is that Senna implemented in 2017 a Pattern Index that
allows one to create special indices to accelerate lookup of
particular sorts of patterns in Atomspace,

https://github.com/andre-senna/opencog/blob/feature_pattern_index/opencog/learning/pattern-index/README.md

It's not the same exact idea as your pattern cache, though.

-- Ben

Jon P

unread,

May 1, 2021, 5:58:36 AM5/1/21

to opencog

Hi Ben,

Thanks for hosting the AGI discussion yesterday, it was interesting to hear more about the Galois Connections and COFO ideas you've been working on. I've been reading your papers a bit and they are interesting.

One question I have: how do you know if you've built an AGI? Is there a test set of some kind which can be used to verify how general the intelligence is? For example if a single system could play chess, identify pictures of dogs and answer maths problems written in words is that sufficient to declare it an AGI or would it need to be able to do a lot of other tasks? I am not sure if there has been a lot of work on this already and benchmarks are well defined.

Jon

Ben Goertzel

unread,

May 2, 2021, 2:24:11 AM5/2/21

to opencog

See "Coffee test", "Robot college student test", "employment test" ;)

https://analyticsindiamag.com/5-ways-to-test-whether-agi-has-truly-arrived/

Or more seriously look at the section "Scenarios for Assessing AGI" in
our old paper "mapping the landscape of AGI"

https://ojs.aaai.org//index.php/aimagazine/article/view/2322

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/42d634d7-364b-4cca-b669-57ff2b0c7c0bn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages