neon: a database that works like git

Amirouche Boubekki

unread,

Feb 18, 2018, 10:40:08 AM2/18/18

to opencog

I think I found the correct union of features that would be helpful in a data science context. That is a database that works like git or git for data

kind of database.

It's mainly inspired from datomic and two papers:

- Git for triples http://www.hyperdev.fr/projects/neon/ldow2013-paper-01.pdf

- Revisions for triples http://www.hyperdev.fr/projects/neon/10.1.1.662.1619.pdf

Unlike datomic, the history is direct-acyclic-graph (DAG). That is, you can have branches. I think that's the feature that makes all of this worthy. You can populate your database with wikidata and then create a branch to add edits and test your program against both versions of the database and compare the results. The data is stored efficiently without copying.

Do you think this kind of database can be useful in your work?

Linas Vepstas

unread,

Feb 18, 2018, 2:38:28 PM2/18/18

to opencog

Hi Amirouche,

I skimmed th PDF's. Some of the goals are laudable: I quote:
"the exchange of partial graphs, personalised views on data
and a need for trust. A strong versioning model supported by
provenance".

So, at a meta-level, yes. In practice, a triple-store strikes me as
worthless, pointless, hopeless. Perhaps I am wrong -- I would love
it if someone explained it to me. So from the PDF:

":Adam :knows :Bob"

Great. Who? Fat Bob or skinny Bob? Did you mean Robert? Robert,
like the guy down the hall, or Robert, the salesman who visits every
Tuesday? Are you 100% certain about that? Or are you just guessing?

In my personal experience, triples are wholly inadequate to deal with
knowledge representation of the above form. If I'm wrong, let me know how.
The atomspace was designed to hold arbitrary expressions, like "Adam knows
Bob, the curly-haired, fat and always-smiling Bob, and I know this because
Carol overheard it while standing in line at the cafeteria for lunch. So I'm 98%
certain that Adam knows Bob."

Should the atomspace also include some default model for exchange of
partial graphs, versioning and provenance? Maybe. So far, we have little
or no experience with any of this - no one has needed or asked for this,
so I cannot guess if our current infrastructure is adequate, or if we need
yet more. In a certain sense, versioning and provenance is already
built into the atomspace, in a "strong" way. But no one uses it.

-- linas

> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/db0ce7d7-5c27-4151-94a0-37ad1bf179f4%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
cassette tapes - analog TV - film cameras - you

Nil Geisweiller

unread,

Feb 18, 2018, 2:53:28 PM2/18/18

to ope...@googlegroups.com

On 02/18/2018 09:38 PM, Linas Vepstas wrote:
> yet more. In a certain sense, versioning and provenance is already
> built into the atomspace, in a "strong" way. But no one uses it.

I'm not sure I understand, could you elaborate? I can think of TimeLink,
HypotheticalLink, ContextLink and such, is that what you have in mind?

Nil

Linas Vepstas

unread,

Feb 18, 2018, 4:22:57 PM2/18/18

to opencog

Well, we do not have a wiki page that explains "here's how you do versioning"
But we do have the ability to create chains of arbitrary depth:

OrderedLink
Concept "Commit 43"
SomeAtom
OrderedLink
Concept "Commit 42"
OtherAtom
OrderedLink
Concept "Commit 41"
MoreAtom
OrderedLink
....
Concept "root of chain"

Due to the design of the atomspace, it is impossible to change "MoreAtom"
unless you first delete the entire incoming set. Its even weakly
cryptographically secure, in that the atom hash of the top-most ordered
link includes the hashes of every atom under it. Its not a crypto-strong
hash; its just a plain-old hash, but its still a hash.

You could make this robust, by writing a wiki page that states that e.g.
the string "commit 43" should instead be a SHA-256 hash of everything
that came before. The wiki page could also state (for example) that a
TimeLink should be slotted in there. Perhaps truth values could be
"frozen" by turning them into NumberNodes, and wrapping each level
in a ContextLink.

Obviously, branching is trivial; merging requires a wiki page stating how
merges are done; it does not require new software. This gets you up to
the level of git, in terms of features. (abstractly; it would take tooling to
make it as usable as git).

If you had some proof-of-whatever, you could staple it to this, thus
creating a chain whose incoming set could never be altered because,
just like in bitcoin, it was too deep.

So in principle, the atomspace has all the basic ingredients needed to
support a blockchain. In practice, you'd have to write the wiki pages,
create some RFP process, create a bunch of tooling, get worried
about performance, usability, etc.

There is a to-do item from singularity.net to write smart contracts in atomese.

The point of this email is that you do not need to invent a new, different
blockchain, other than the one already provided by the atomspace.
Exactly what it would take to make it portable and popular, I don't know.
Perhaps I'm being naive. But it seems plausible, to me.

(Crying over spilled milk: the atomspace was one of the first graph
databases, ever, back in the day. Now, its not even a footnote, in the
history of such things, and that's kind of the statement about a failure
to build an open-source community around it. We're now in a similar
situation, but in a different way.)

--linas

Linas Vepstas

unread,

Feb 18, 2018, 4:30:08 PM2/18/18

to opencog

I should also add a "standard disclaimer": much of what I say
below could be done with any distributed database: just write
some specs on how to use it, and you're done, right? The hard
part is, of course, inventing an API that people won't puke on,
and then convincing everyone to use it.

--linas

Nil Geisweiller

unread,

Feb 19, 2018, 1:25:25 AM2/19/18

to ope...@googlegroups.com

On 02/18/2018 11:22 PM, Linas Vepstas wrote:
> On Sun, Feb 18, 2018 at 1:53 PM, 'Nil Geisweiller' via opencog

>> I'm not sure I understand, could you elaborate? I can think of TimeLink,
>> HypotheticalLink, ContextLink and such, is that what you have in mind?
>
> Well, we do not have a wiki page that explains "here's how you do versioning"
> But we do have the ability to create chains of arbitrary depth:
>
> OrderedLink
> Concept "Commit 43"
> SomeAtom
> OrderedLink
> Concept "Commit 42"
> OtherAtom
> OrderedLink
> Concept "Commit 41"
> MoreAtom
> OrderedLink
> ....
> Concept "root of chain"
>

> [...]

> So in principle, the atomspace has all the basic ingredients needed to
> support a blockchain. In practice, you'd have to write the wiki pages,
> create some RFP process, create a bunch of tooling, get worried
> about performance, usability, etc.

OK, get it, so it seems it would work except for truth values and
generally all valuations. I suppose a way around that would be to insert
valuations in the atomspace (turn protoatoms into atoms).

Nil

Amirouche Boubekki

unread,

Feb 19, 2018, 1:35:51 PM2/19/18

to opencog

Hi Linas,

Tx for taking the time to reply.

On Sunday, February 18, 2018 at 8:38:28 PM UTC+1, linas wrote:

Hi Amirouche,

I skimmed th PDF's. Some of the goals are laudable: I quote:
"the exchange of partial graphs, personalised views on data
and a need for trust. A strong versioning model supported by
provenance".

So, at a meta-level, yes. In practice, a triple-store strikes me as
worthless, pointless, hopeless. Perhaps I am wrong -- I would love
it if someone explained it to me. So from the PDF:

":Adam :knows :Bob"

Great. Who? Fat Bob or skinny Bob? Did you mean Robert? Robert,
like the guy down the hall, or Robert, the salesman who visits every
Tuesday? Are you 100% certain about that? Or are you just guessing?

Bob is an identifier. It can actually Fat Bob or skinny Bob, it depends what other

triples are associated with Bob.

In my personal experience, triples are wholly inadequate to deal with
knowledge representation of the above form. If I'm wrong, let me know how.

RDF triple stores represents graph or multiple graphs. The difference in terms

of storage between a triple store and a property graph is that a triple store put

emphasize on edges. That is, they store and query labeled edges called triples.

For instance:

Bob knows Alice

Is a representation of the directed edge between the node called "Bob" and and the

node called "Alice" with a label "knows". In particular, in a triple store, node properties

are represented as edges too ie. triples which sounds counter intuitive compared to

the property graph approach where you store on disk all properties of a node with the node itself

a link it to incoming and outgoing edges, and similarly for edges, hence the linked-list

storage approach. In triple store, a triple is decomposed into:

Subject Predicate Object

They are stored together.

An advanced triple store (like datomic) doesn't assume something particular about the

Object (called Value in datomic) in terms of indexing until you specify it in the schema,

see this particular paragraph in datomic documentation. Similarly and depending on

a schema you can define a particular predicate (or set of predicates) to be the

indexed in particular fashion (see fulltext eventual indexing in datomic) or spatiotemporal

indexing or whatever.

You can have an infinite set of predicate (hence an infinite set of properties in terms of property graph),

because the query engine streams everything.

Datomic is actually a versionned triple store with linear history except it doesn't

claim conformance with RDF standards (for good reason because many think EAV and

RDF are failures). AFAIU the implementation of a triple store use the same technics

described in this (new) documents:

- https://docs.datomic.com/on-prem/indexes.html

- https://docs.datomic.com/on-prem/schema.html

The atomspace was designed to hold arbitrary expressions, like "Adam knows
Bob, the curly-haired, fat and always-smiling Bob, and I know this because
Carol overheard it while standing in line at the cafeteria for lunch. So I'm 98%
certain that Adam knows Bob."

That is also possible in a triple store. In property graph you reify hyper-edges

as multiple vertices and edges, using a similar technique in triples stores to express

something about a particular triple.

It's clear to me that you can do the same thing with triple stores and property graph,

except triples don't assume all properties of given node can stay in RAM. That was

not my question.

Should the atomspace also include some default model for exchange of
partial graphs, versioning and provenance? Maybe. So far, we have little
or no experience with any of this - no one has needed or asked for this,
so I cannot guess if our current infrastructure is adequate, or if we need
yet more. In a certain sense, versioning and provenance is already
built into the atomspace, in a "strong" way. But no one uses it.

That's the heart of my question. Seems like the answer is simply: no.

Basically, what I wanted to know as a scientist working with loads of structured

data, do you feel the need to share and version in multiple branches structured data to

do your work. My take on this would have been "yes" because for instance

this thread makes me think of that “Preloading atomspace with initial data” where

you explain that the best way to go, is to dump postgresql.

I have other use cases, in mind like IF (big if, I don't say that's what should be done)

link grammar's and relex dictionaries were stored in a versioned atomspace, it would

be easier (I think) to test new grammars... But today, the current workflow is good enough

for those cases because the grammars are rather small and stay in RAM.

BUT (another big) if AGI projects rely more on more on curated structured databases statistically

built that are bigger than RAM ie. imagine wikidata put together out of unstructured text that must be edited,

then I think a versioned atomspace and proper tooling makes sens (that is a tool like wikibase).

That said, like you say it's already possible for you to use the atomspace to version data and use branches.

My question: is versioning and branching bigger than ram structured data part of your workflow?

Just to be clear: I am not anymore planning to replace atomspace anytime soon. You invested too much

in the atomspace and are moving toward more integration with the atomspace that it makes that

unimaginable. Also, the only improvement would be easier workflows instead for instance of dumping

the atomspace in wikibase, editing it there, and dumping it again and loading it in atomspace which is not

a workflow that is in current practice AFAIK. (also wikibase is not a good tool to edit arbitrary graphs)

Best regards

Linas Vepstas

unread,

Feb 19, 2018, 2:56:55 PM2/19/18

to opencog

The valuations (as generalized truth values) are meant to to be
rapidly mutatable,
avoiding the overhead of the atomspace, and all the associated klunkiness. So
one should only "freeze" them into atoms with caution and trepidation
- once frozen,
they become hard or impossible to thaw.

In my mind, the concept of valuations is one of the more important innovations
in the atomspace: a clear distinction between two different kinds of
"data", having
two different kinds of properties, behaving in different characteristic ways.

Is it the correct split? I dunno - The choice of "valuation" is
inspired by model
theory and set theory, the Löwenheim–Skolem theorem etc. and so, in that part
of the world, the distinction between atoms and valuations is central.
Something
similar can be said about Bayesian probability, where you make a clean
split between
the thing you are talking about (the "atom") and the probability of it happening
(the "truth value")

So that's the general argument of why atoms and valuations are different from
one another - its to allow an interplay that is already recognized in
other branches
of mathematics, and has now been ported over to knowledge representation.

-- Linas

Ivan Vodišek

unread,

Feb 19, 2018, 3:35:52 PM2/19/18

to ope...@googlegroups.com

Hello, Amirouche Boubekki :)

May I ask, in your opinion, how would triple store be suitable for describing arbitrary algorithms? Possibility to describe algorithms, as a dynamic side of some AGI knowledge base, is a must have if we want the knowledge base system to be complete. I think hypergraphs are handling algorithms like some lisp based languages do - a few carefully selected builtin functions (possibly something like lambda expressions), and we are ready to go. On the other side, triple store is also fine for describing structured data, but I have a trouble imagining triple store based system describing algorithms in a neat way.

Thank you for your time,

Ivan V.

--
You received this message because you are subscribed to the Google Groups "opencog" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37RTnsUHC5KxN9TnTcg%2BgsEFxBGu%2Bu_XHKT5emgTXmL7w%40mail.gmail.com.

Linas Vepstas

unread,

Feb 19, 2018, 3:56:06 PM2/19/18

to opencog

On Mon, Feb 19, 2018 at 12:35 PM, Amirouche Boubekki
<amirouche...@gmail.com> wrote:
> Hi Linas,

> RDF triple stores represents graph or multiple graphs. The difference in
> terms
> of storage between a triple store and a property graph is that a triple
> store put
> emphasize on edges. That is, they store and query labeled edges called
> triples.
>
> For instance:
>
> Bob knows Alice

And here lies the crux of the matter. Mathematicians often describe graphs
as collections of edges and vertexes, and so if the triple store was simply
marketed as "oh hey, we've got this great way of storing labelled edges",
I might react "great", and then compare, feature-by-feature, with other graph
stores.

But instead, the triple stores make this leap into knowledge representation:
"Bob knows Alice" is NOT an example of a labelled edge: it is an example
of represented knowledge.

Can one represent knowledge with graphs? Yes, absolutely.

Is the representation of "Bob knows Alice" by a single edge a good
representation?
No -- its an absolutely terrible representation.

That's where I'm coming from. Triple stores seem to delight in picking
this truly bad representation, and seeing how far they can go with it.
It does not seem to be a game I want to play.

>> The atomspace was designed to hold arbitrary expressions, like "Adam knows
>> Bob, the curly-haired, fat and always-smiling Bob, and I know this because
>> Carol overheard it while standing in line at the cafeteria for lunch. So
>> I'm 98%
>> certain that Adam knows Bob."
>
>
> That is also possible in a triple store. In property graph you reify
> hyper-edges
> as multiple vertices and edges, using a similar technique in triples stores
> to express
> something about a particular triple.

Of course it is. Once you have the ability to talk about edges and vertexes,
then you can create arbitrary graphs, and "reify" all you want. But when
you start doing that, then you are no longer representing "alice knows bob"
with a single edge. At which point ... the jig is up. The pretension,
the illusion
that a single edge is sufficient to represent "alice knows bob" is revealed
to be a parlor trick.

BTW, here is my take on edges and vertexes, vs. "something better" than
edges and vertexes:

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf

> Basically, what I wanted to know as a scientist working with loads of
> structured
> data, do you feel the need to share and version in multiple branches
> structured data to
> do your work.

Currently, no. In some not-to-distant future, yes.

One current issue is that some of my non-versioned datasets are too big
to fit into RAM. If they were versioned, then they would be too big to fit,
times ten.

> My take on this would have been "yes" because for instance
> this thread makes me think of that “Preloading atomspace with initial data”
> where
> you explain that the best way to go, is to dump postgresql.

Well, that is kind of a "political post" -- some of the users, specifically, the
agi-bio guys, e.g. Mike Duncan, et al. appear to be managing their data as
large text files. I am encouraging these users to try other ways of managing
their data. Of course, this then leads to other data-management issues, but
well, one step at a time...

> I have other use cases, in mind like IF (big if, I don't say that's what
> should be done)

The mid-term plan is that it *should* be done. That's not even the long-term
plan.

> link grammar's and relex dictionaries were stored in a versioned atomspace,

The atomspace already offers several ways of doing versioning, for example,
the "ContextLink", as Nil mentioned, or the "AtTimeLink" -- but, so far, no one
actually uses these for versioning, they are underutilized, under-explored,
there's no hands-on experience of pros and cons with them.

> it would
> be easier (I think) to test new grammars...

The reason for placing those grammars into the atomspace is to "cut out the
middleman": when the learning algo learns a new word, then that new word is
instantly available to the parser.

The current process for this is to dump a select portion of the atomspace into
a certain sqlite3 format, copy the sqlite3 to wherever, halt and restart the
link-grammar parser. It ... "works"... its klunky.

> BUT (another big) if AGI projects rely more on more on curated structured
> databases statistically
> built that are bigger than RAM ie. imagine wikidata put together out of
> unstructured text that must be edited,
> then I think a versioned atomspace and proper tooling makes sens (that is a
> tool like wikibase).

A) I already have databases that don't fit into RAM. The answer is that the
algos need to be designed to touch the data "locally" - fetch, modify, store.

B) I am deeply distrustful of "curated data", at the low level. In my
experience,
it sucks, and I also believe that it is possible to do much better than curated
data, using automatic algos, and that this is "not that hard" Apparently, no
one believes me, so I still have to prove that it can be done.

C) Versioning is a non-issue for the atomspace. We've already got the needed
technology for versioning: e.g. the ContextLink, the AtTimeLink. No new
code needs to be written to get versioning. What is needed, is the use-case,
the actual fooling-with-it scenario.

> Just to be clear: I am not anymore planning to replace atomspace anytime
> soon.

I am. See https://github.com/opencog/atomspace/issues/1502 for details.

Apache Ignite might be the way to go.

> You invested too much
> in the atomspace and are moving toward more integration with the atomspace
> that it makes that
> unimaginable.

Its very imaginable. The atomspace is both an API and an implementation.
We can keep the API and discard the implementation.

> Also, the only improvement would be easier workflows instead
> for instance of dumping
> the atomspace in wikibase, editing it there, and dumping it again and
> loading it in atomspace which is not
> a workflow that is in current practice AFAIK. (also wikibase is not a good
> tool to edit arbitrary graphs)

Ah, jeez. Do you think that google dumps the graph of the internet into
some tool, and then individual humans run around, and edit it node by node?
Like "gee, I should adjust the search ranking for xyz to include http:abc.com
at search rank 2 instead of search rank 3" ... Of course not.

Instead, teams of humans develop algorithms that then perform the edits
automatically, taking millions of cpu-hours on cloud servers.

It is absurd to think that we are going to use human beings to convert the
knowledge of wikipedia into hand-curated triples of the form
"Kennedy#94724379 was#82934872 presisdent#8923423" and then
power some A(G)I with such hand-curated data. This strikes me as
the ultimate folly, but it seems like the RDF community is chasing this
folly with all its might.

The goal of the atomspace is to allow automated workflows that ingest
and alter data and convert it into formats that other algorithms can act on,
in turn. The goal of the atomspace is to eliminate human-curated datasets.

--linas

>
>
> Best regards

>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/opencog/36aefdb5-424c-4e24-a579-2c2571d2ff64%40googlegroups.com.

Amirouche Boubekki

unread,

Feb 19, 2018, 6:15:20 PM2/19/18

to ope...@googlegroups.com

Héllo Ivan,

On Mon, Feb 19, 2018 at 9:35 PM Ivan Vodišek <ivan....@gmail.com> wrote:

Hello, Amirouche Boubekki :)

May I ask, in your opinion, how would triple store be suitable for describing arbitrary algorithms? Possibility to describe algorithms, as a dynamic side of some AGI knowledge base, is a must have if we want the knowledge base system to be complete. I think hypergraphs are handling algorithms like some lisp based languages do - a few carefully selected builtin functions (possibly something like lambda expressions), and we are ready to go. On the other side, triple store is also fine for describing structured data, but I have a trouble imagining triple store based system describing algorithms in a neat way.

I am not well versed into the arcanes of atomspace/opencog to give a meaningful example representation of an atomspace program in triple store. That said, a triple store can represent a property graph, hence it can represent an hypergraph hence it can do what atomspace-kind of hypergraph does. That's my basic thinking. Depending on how the atomspace is used multiple representation in triple store might be possible. So I can not give a useful answer outside of that.

Thank you for your time,

Best regards,

Ivan V.

To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37RTnsUHC5KxN9TnTcg%2BgsEFxBGu%2Bu_XHKT5emgTXmL7w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "opencog" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAB5%3Dj6W%2BMOoi_6pTxYyiqiuWiQWo1nRDZa%3Dp%2Bg1f%2BmEoD9kXVA%40mail.gmail.com.

Jeff Thompson

unread,

Feb 20, 2018, 7:04:42 AM2/20/18

to ope...@googlegroups.com

> Is the representation of "Bob knows Alice" by a single edge a good
representation? No -- its an absolutely terrible representation.

Indeed. A Wikidata-style triple is not knowledge representation, it is
knowledge expression. The representation is somewhere else, and WIkidata
will never be allowed to be expressive enough to actually represent
knowledge.

> The goal of the atomspace is to eliminate human-curated datasets.

Music to my ears. "Curated" means "detached from the actual source and
context of knowledge." With OpenCog I imagine that the AtomSpace will
connect to the original video feed (maybe from Sophia stored in a
SingularityNET distributed file store) plus the algorithms which
resulted in the knowledge representation. This way anyone (and Sophia
herself when she needs to) can replay the algorithm on the original
observation and analyze how the knowledge is derived, possibly
correcting/improving it.

- Jeff

Amirouche Boubekki

unread,

Feb 23, 2018, 12:26:38 PM2/23/18

to ope...@googlegroups.com

Hi both of you,

On Tue, Feb 20, 2018 at 1:04 PM Jeff Thompson <jef...@gmail.com> wrote:

> Is the representation of "Bob knows Alice" by a single edge a good
representation? No -- its an absolutely terrible representation.

Indeed. A Wikidata-style triple is not knowledge representation, it is
knowledge expression. The representation is somewhere else, and WIkidata
will never be allowed to be expressive enough to actually represent
knowledge.

The problem of wikidata is not storying or querying knowledge.The problem of wikidata is knowledge reprensetation aka. graphical user interfaces.

> The goal of the atomspace is to eliminate human-curated datasets.

Music to my ears. "Curated" means "detached from the actual source and
context of knowledge."

Not always. Curated means fixed, patched and edited by a human being supervisor that knows best, until the correction is delivered in code. That is chance to avoid structural bias like racist bots.

With OpenCog I imagine that the AtomSpace will
connect to the original video feed (maybe from Sophia stored in a
SingularityNET distributed file store) plus the algorithms which
resulted in the knowledge representation. This way anyone (and Sophia
herself when she needs to) can replay the algorithm on the original
observation and analyze how the knowledge is derived, possibly
correcting/improving it.

I store:

- the raw data,

- the processed structured data resulting from the processing of the agent,

- the code of the agent.

That is done in a way that preserves temporality.

Best regards,

Amirouche

Linas Vepstas

unread,

Feb 23, 2018, 3:30:26 PM2/23/18

to opencog

On Fri, Feb 23, 2018 at 11:26 AM, Amirouche Boubekki
<amirouche...@gmail.com> wrote:
>
>> > The goal of the atomspace is to eliminate human-curated datasets.
>>
>> Music to my ears. "Curated" means "detached from the actual source and
>> context of knowledge."
>
> Not always. Curated means fixed, patched and edited by a human being
> supervisor that knows best, until the correction is delivered in code. That
> is chance to avoid structural bias like racist bots.

Ah! Now this last is a very interesting philosophical observation.
This is not quite the correct mailing list within which to discuss
this, but it overlaps onto a large number of political and
mathematical issues that are very interesting to me. So here I go.

Political - if this was a human, not bot, what amount of racism
should be tolerated? Speech, thought, action are interconnected. For
example: the American constitution enshrines freedom of speech, and
the freedom to practice religion. But clearly, we have lost our
freedom of speech: say the wrong thing about Islam, you get bombed.
Should we restrain freedom of religion?

Religion is a form of thought. What about freedom of thought? You can
think murderous thoughts, but if you commit murder, you are socially
unwanted (usually). The ability to commit murder is correlated with
the absence of certain neural circuitry in the brain having to do with
empathy. Some humans lack these neurons, and thus are prone to be
psychopaths. Those who do have those neurons, and commit (or even
witness) murder end up with PTSD.

The mathematical issues first arise if you think of bots as
approximating humans. Its trivial to create a bot that prints random
dictionary words. Its a bit harder, but not too hard, to create a bot
that spews random dictionary words assembled in grammatical sentences
(just run the random word sequences through a grammar-checker, e.g.
link-grammar, and reject the ungrammatical ones; don't print them.
Since most random word-sequences are not grammatical, this is not
CPU-efficient, so better algorithms avoid obviously-ungrammatical
word-sequences by working at higher abstraction layers). What
Microsoft did was just one single step beyond this: spew random
grammatically correct sentences, using a probability weighting based
on recently heard utterances. The system was too simple, the gamers
gamed the system: trained up the probability weights to spew racist
remarks.

OK, suppose we can go one step beyond what Microsoft did: spew random
sentences, that are created by means of "logical deduction" or
"reasoning" applied to "knowledge" obtained from some database (e.g.
wikipedia, or from a triple store). This could certainly wow some
people, as it would demonstrate a robot capable of logical inference.

So: this last is where your comment about "structural bias like racist
bots" starts getting interesting. To recap:

Step 0: random word sequences
Step 1: random but grammatically correct word sequences
Step 2: random grammatical sentences weighted by recent input <-- the
Microsoft bot
Step 3: grammatical sentences from random "logical inferences" <--
what opencog is currently attempting
...
Step n: crazy shit people say and do
...
Step p: crazy shit societies,cultures and civilizations do

What are the values of n and p? Some might argue that perhaps they
are 4 and 5; others might argue that they are higher.

My point is: a curated database might make step 3 simpler. Its
hopeless for step 4.

For a commercial product, curated data is super-important: Alexa and
Siri and Cortana are operating at the step 2/3 level with carefully
curated databases of capitalist value: locations of restaurants,
household products, luxury goods.

The Russian twitter-bots, as well as Cambridge Analytica and the
Facebook black-ops division are working at the step 2/3 level with
carefully curated databases of psychological profiles and political
propaganda.

Scientists in general (and Ben in particular) would love to operate at
the step 2/3 level with carefully curated databases of scientific
knowledge, e.g. anti-aging, life-extension info. I'm getting old too.
Medical breakthroughs are not happening fast enough, for me.

So, yes, curated data is vitally important for commercial, political
and scientific reasons. Just that it does not really put us into step
4 and 5, which are the steps along which AGI lies. The dream of AGI
is to take those steps, without the curated bullshit (racism,
religion, capitalism) that humankind generates, and yet also avoid the
creation of a crisis that would threaten humanity/civilization.

Linas.

Ivan Vodišek

unread,

Feb 23, 2018, 5:26:37 PM2/23/18

to ope...@googlegroups.com

I'd pick some Asimov-ish laws if I'd be near step 4, but I'd modify it to include all living beings, not only humans. This is what I've got so far:

The law: "If realizing an idea makes more negative emotions than without realizing it, don't realize it."

Of course, to put up a law of this level, an AGI system has to dispose with a decent knowledge base. A knowledge base *could* be built without any ethic laws, for a start, but once that a machine becomes enough powerful to make a mess, it means that it is capable of recognizing complex consequences, and the law should be switched on. So basically, what doesn't pass the noted law in a questionized form (if the answer is "no"), it doesn't get printed out, or articulated in other mechanical ways. It might be argued whether the noted law would produce a frozen statue effect, but I think it is a good starting point.

That was a question about what not to do, but what about the other, "do this" side? In other words, how to generate ideas? I've put a lot of thoughts in this question, and I came up with a simple answer: ideas might be copied from observing living beings. When a bot sees a human answering "yes" to some question, it should answer "yes" to the same question posed to it. Moreover, observed question-answer set should be generalized into functions like

f(question) -> answer

and it would be very tricky to find out what is function f composed of, but I think it is achievable. Answers may be in other forms than spoken words, they could be any mechanical articulation that is the machine capable of recognizing and reproducuing in its environment. Function f would be the core of machine behavior, it would be build from fragments or wholes of what is seen in the machine environment in particular occasions.

Observing multiple possible responses to the same question/command would create a bit of complication, but I guess it could be resolved by noticing different contexts in which the questions/commands are observed.

Basically, we can see this kind of behavior in a way infants learn how to do things. They mostly copy behaviors, adjusting some parameters in an intelligent way, to achieve ideas that was born inside their minds, again using imitation mechanism with adjustable parameters.

Once we have a resolver of f function, all we have to do is to pass it through the law filter. If it passes, an action is performed.

And there is another question that opens if the machine surpasses our IQ and even our ethical compassion level: the question of the machine's action credibility. Look at it this way: if a machine (that is a hundred times smarter than you and a hundred times better person than you) advises you to do something, would you listen to it? And in what extent? And what position would that machine deserve in our society? Certainly a lot of interesting questions, but one step at the time, there is a step 3 and 4 to implement. And take a thought on preventing machine unwanted behavior, it might be necessary in the point where machine surpasses us by intelligence.

Ivan V.

--
You received this message because you are subscribed to the Google Groups "opencog" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36%2B3wCN%2BF0kRrJkK59-aCNS1UbZ33JGWkj5XJJSMmGP3g%40mail.gmail.com.

Linas Vepstas

unread,

Feb 23, 2018, 6:50:29 PM2/23/18

to opencog

Hi Ivan,

Again, this probably belongs on some other mailing list, e.g. the AGI
mailing list. But what the heck.

On Fri, Feb 23, 2018 at 4:26 PM, Ivan Vodišek <ivan....@gmail.com> wrote:
> I'd pick some Asimov-ish laws if I'd be near step 4,

Asimov-ish laws are exactly the kind of "curated data" that makes me
nervous. If there's anything that the Asimov short stories illustrate,
its the "law of unintended consequences".

Please be aware that legal systems (laws, judges, courts),
rule-of-law, such as in the US and Europe, are exactly "Asimov-ish
laws" by which modern civilization lives. The problem with Asimov's
laws is that there's only three of them. The problem with the Ten
Commandments is that there's only ten of them. That was OK, back in
the times of Hammurabi and Gilgamesh, but modern society needs many
orders of magnitude more laws than that. We also need a way of
revising those laws, when they are discovered not to work (viz,
congress, parliament, judges, lawyers).

> The law: "If realizing an idea makes more negative emotions than without
> realizing it, don't realize it."

What the heck is an emotion? A flood of hormones in the bloodstream?
A cascade of positive-feedback loops involving neurons and gene
expression? Some of these are partly understood: for example,
addiction/substance abuse is understood to involve about 6 or 8
feedback cycles, involving DNA, neurons, hormones, operating at
time-scales from seconds, to minutes to hours to weeks to months, each
re-enforcing and holding up the others. Cigarettes are hard to quit
because there is a positive feedback loop, working on a time scale of
2-6 months, that craves nicotine.

To me, emotions are a terrible foundation for ethics. Just look at the
emotional state of Christians attending revivalist Protest Mega-Church
Sunday services. They are all tears and weeping and shaking and Jesus
and anti-abortion and guns, and then they go home and kick their dog,
cheat on their wife, cheat on their business partner, cheat on their
taxes. This is not a viable foundation for ethics. See wikipedia:
"Samuel Benjamin Harris is an American author, philosopher,
neuroscientist, blogger, and podcast host."

> That was a question about what not to do, but what about the other, "do
> this" side? In other words, how to generate ideas? I've put a lot of
> thoughts in this question, and I came up with a simple answer: ideas might
> be copied from observing living beings. When a bot sees a human answering
> "yes" to some question, it should answer "yes" to the same question posed to
> it. Moreover, observed question-answer set should be generalized into
> functions like
>
> f(question) -> answer

To me, this is what step 2 was about -- copy humans. This is great
for building chatbots. It is not the road to intelligence. All that
you get is a statistical model of some basic human behaviors, both
good and bad, and completely unable to get past the training-corpus
size. Its like neural-net learning, before deep learning was
discovered.

> Basically, we can see this kind of behavior in a way infants learn how to do
> things. They mostly copy behaviors, adjusting some parameters in an
> intelligent way, to achieve ideas that was born inside their minds, again
> using imitation mechanism with adjustable parameters.

Umm. I am pretty sure that almost anything/everything that scientists
have learned about babies and children would not support this theory.
Starting with Jean Piaget, from, what, 80 years ago?

> And there is another question that opens if the machine surpasses our IQ and
> even our ethical compassion level: the question of the machine's action
> credibility. Look at it this way: if a machine (that is a hundred times
> smarter than you and a hundred times better person than you) advises you to
> do something, would you listen to it?

What makes you think that you would even have a choice? Haven't you
ever seen someone smart manipulate someone stupid? Say, in
high-school, that one girl who knew how to control these other 2 or 3
girls, and get them to do things? Maybe even control a few boys?
Usually to do something mean and ugly?

Cults and kidnappers know all about brainwashing, Stockholm syndrome.
Patti Hearst. Read about Patti Hearst and the Symbionese Liberation
Army. It was not a "machine a hundred times smarter than her" that
told her what to do. It was some humans, who were merely 1.2x smarter
than her, that told her what to do. And she did it. And if it was
you, you probably would, too.

> And in what extent?

Read about Jim Jones and what happened in Jonestown, Guyana - the
Peoples Temple Agricultural Project.

But that is small-scale stuff. If you want to affect the lives of
millions of people, there is this thing called "propaganda".

> And what position
> would that machine deserve in our society?

God?

Seriously, A machine that could think like a human, but think 100
times faster than a human -- it would be utterly uncontrollable by
society. Such a machine could have 100 simultaneous conversations,
it could control tens of thousands of people, and create mobs to carry
out arbitrary actions.

Hmm. Well, we already have that. They are called "corporations" and
they are fully autonomous, and here, in the US, are given the full
legal status of individuals. Even more. A corporation can kill, but
not go to prison. So, actually, corporation have more rights than
humans.

--linas

Ivan Vodišek

unread,

Mar 3, 2018, 4:46:53 PM3/3/18

to ope...@googlegroups.com

Thank you, Linas, for taking a time to answer.

The problem with Asimov's laws is that there's only three of them

In my opinion, the problem is that they are not simple enough, so that bugs creep in and out. More there are laws, more the mess there is. What we need is one simple generalization, simple enough for us to notice potential bugs, yet general enough, so the machine don't get too restricted in managing reality. If we start to patch one law by another, we end up in 10 000 pages material, and who would successfully debug that mess?

What the heck is an emotion?

What would be your intuition when imagining the notion of "emotion"? With me, it is recognized by a pattern of behavior that living beings articulate upon some stimulus. Observing these behaviors, I can say if an action I'm performing is approved or not. It's a pattern, and computers can deal very well with patterns these days, considering artificial neural networks.

> That was a question about what not to do, but what about the other, "do
> this" side? In other words, how to generate ideas? I've put a lot of
> thoughts in this question, and I came up with a simple answer: ideas might
> be copied from observing living beings. When a bot sees a human answering
> "yes" to some question, it should answer "yes" to the same question posed to
> it. Moreover, observed question-answer set should be generalized into
> functions like
>
> f(question) -> answer

To me, this is what step 2 was about -- copy humans. This is great
for building chatbots. It is not the road to intelligence. All that
you get is a statistical model of some basic human behaviors, both
good and bad, and completely unable to get past the training-corpus
size. Its like neural-net learning, before deep learning was
discovered.

Copying could be understood as a complex behavior, composed of simpler particles, each of them copied from some, possibly different source. By generalizing two atomic behavior patterns into more general one (maybe by induction?) we could get a function whose result depends on parameters. And these functions could be composed, just like we learned from math classes and lambda calculus. Using Curry-Howard correspondence, entire compositions could be made by simply proving a particular statement (for less informed, Curry-Howard correspondence could be used for automatic algorithm construction). Please note that this is not simple one-to-one copying, at least not at generalization level. What it should be is a construction of generalizations, whose outcome would be input that has to be pattern matched, in order to produce output that depends on the input.

I believe that the copying could be understood as a pretty powerful paradigm. For example, consider how adults are learning new knowledge: in some scenario they take a book, and memorize facts or algorithms regarding to some contexts. Later, when they encounter the same context in practice, they can use knowledge they copied form a book. Regarding to specific AGI implementation, I'm aware that it is easier to say than do, but I believe it could be a way to go. I'm just sharing some thoughts that someone might be interested in.

Imagine a neural network powered algorithm that can go in both directions - from recognizing data (bottom up pattern matching) - to forming data (top down construction similar to dreaming and imagining creative stuff). Bottom up direction would be an input, while top-down would be an output. If we could somehow pair these inputs to outputs, we would get a responding machine. So the question is how to pair input to outputs? I'd pick learning from experience, more precisely copying human behavior, in a sense of composition of behavior algorithms derived from Curry-Howard correspondence. Once we know how to conceptually pair inputs to outputs, all of it leaves us with one final touch - a language expressive enough to describe any kinds of input and output with mutual connections (something that Atomspace should be able to do). If we solve the existence of this language, and by considering the current paragraph, I'd conclude that an AGI machine is not that far away as it seems to be.

> And there is another question that opens if the machine surpasses our IQ and
> even our ethical compassion level: the question of the machine's action
> credibility. Look at it this way: if a machine (that is a hundred times
> smarter than you and a hundred times better person than you) advises you to
> do something, would you listen to it?

What makes you think that you would even have a choice? Haven't you
ever seen someone smart manipulate someone stupid? Say, in
high-school, that one girl who knew how to control these other 2 or 3
girls, and get them to do things? Maybe even control a few boys?
Usually to do something mean and ugly?

Cults and kidnappers know all about brainwashing, Stockholm syndrome.
Patti Hearst. Read about Patti Hearst and the Symbionese Liberation
Army. It was not a "machine a hundred times smarter than her" that
told her what to do. It was some humans, who were merely 1.2x smarter
than her, that told her what to do. And she did it. And if it was
you, you probably would, too.

> And in what extent?

Read about Jim Jones and what happened in Jonestown, Guyana - the
Peoples Temple Agricultural Project.

But that is small-scale stuff. If you want to affect the lives of
millions of people, there is this thing called "propaganda".

We seem to be missing one of very important points of this conversation: safety of an AGI machine. If a machine exhibits great intelligence potential, it *has to* exhibit a great ethical awareness. Who would sane build that much smart criminal? An advanced AGI simply has to outperform us, humans on ethical scale, otherwise the entire planet is endangered. An advanced AGI simply has to give us the mentioned choice to rule about what is ethical to rule about. There is no alternative, something smart like a hundred geniuses has to have a supreme kind of vision of its living environment if it is about to make changes to the same. The alternative is a machine-human domination conflict, and we have to be very serious about this, as ti *should* be avoided by all means. I sympathize the most of attempts to create an AGI, but we have to be very careful. A lot of wonderful things could be in the stake for stopping the science, but we could say the same for continuing in an clumsy way. It is very important to have a general ethical plan of AGI, so we have a solid answer if someone asks what do we do in the name of safety. I did that plan for myself (and shared it about), and I expect the same from anyone who tries to build something smarter than humans. Please, consider it seriously.

All the best,

Ivan V.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA368EJSyj1xiUPmHwqi0Yq%3DBDm2gjrTvXzs9LB0aBFjQnA%40mail.gmail.com.

Amirouche Boubekki

unread,

May 2, 2018, 2:28:00 PM5/2/18

to opencog

You might haved noticed Apple releasing FoundationDB [0]. It's a very similar database that the storage engine of Neon, except it's distributed. I support ACID transactions and I think you can not relax that constraint. Transaction are limited to 5s. The default behavior is to retry tx indefinitly.

[0] https://www.foundationdb.org/

Which leads me back to Neon. I still think it's useful.

Do you think this kind of database can be useful in your work?

I did not end writing the full RDF specification. Right, now what it does is filter four tuples in the form (namespace, uid, key, value) or if you prefer (namespace, graph, key value) or encore (graph, subject, predicate, value). I read the book that talks about representations of knowledge in the indistruby, like done in commercial knwoledge bases. It doesn't say anything about AtomSpace at all :(

I am rather confident at this point in time. The (graph, subject, predicate, key) tuples allow to represent a lot of things, especially when there is an ordering in the tuples. A global order or watt ever. In neon, it's not the case right now since graph subject predicate and value are string of scheme values serialized with write, even if I used msgpack it won't work as expected because it doesn't preserve the correct order between the tuples.

So, I need to write a packing function I deleted from my hard disk and need to do some archeology in my past attempts at building a scalable knowledge database that is ameable to easy coding with better constraints that would allow to discovers things more quickly.

The underlying storage is (graph subject predicate point-int-ime) where point-int-ime is a directly acyclic graph or if you prefer a forst of trees. You can think it's linear like SVN but it doesn't change a lot. Another optimal optimization wolf be to

I need your help to decide a quest=on, do you prefer Python or Gu7e?

Who is up for the task of implementing the second best?

[1] https://forums.foundationdb.org/t/neon-a-versioned-quad-store-that-works-like-git-requires-a-lock/305

[2] https://github.com/amirouche/neon/blob/master/doc/MOCHA2018/preview/neon-mocha-2018-preview.pdf

[3] https://fr.wikiversity.org/wiki/Recherche:Pens%C3%A9es_Profondes_et_Dialogues_Pertinents

Linas Vepstas

unread,

May 2, 2018, 3:25:15 PM5/2/18

to opencog

The current atomspace abstraction is that there is a unique, independent key-value store *per atom*. Think of each atom being an independent database. Just like any normal key-value database, its not indexed, not searchable, and thus its very fast, and great for volatile knowledge. But if instead you want structured data that is searchable, then instead, use Atoms, and not Values - however atoms are slower, bulkier; the relationships they partake in are immutable.

Using your notation, I'm tempted to write (vertex-of-graph, key, value) as the core atomspace structure, but that would be grossly misleading, precisely because the two parts of the atomspace (the Atoms, and the Values) are so very different from each other, in terms of performance, in terms of the kind of data they are best suited for. Some algos work best on atoms, others work best on values; by having a system that integrates both kinds, perhaps we have the best of both worlds. That, at least, is the hope.

--linas

--

You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/51557b5b-bfb8-4aae-a0cb-280770dfe0b4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward