Hi Ben, Linas, etc.,Now that we're starting to implement the PLN from scratch (or maybe evolve the current design), I can't help myself thinking about the persistence and the method of reading/writing the atoms and links. As far as I know atomspace is in the class of Bigdata and we all know that RAM isn't big enough and RDBMS is too slow.
Its very poor form to start a discussion by making a preposterous and false statements. I'll read the rest and try to respond, but its just poor form to say things that aren't true, and then expect the audience to follow you wherever you go.
--linas
Regarding RDBMS being fast enough, I guess it comes down to fast
enough for what?
FWIW, I have my doubts that -- when you really dig into the details --
the technological solution you suggest is gonna work for OpenCog.
After I dug fairly deep into various related technologies a year or so
ago, I reluctantly came to the conclusion that for the in-RAM,
**distributed** AtomSpace, we would have to roll our own, as no
existing tech really did the trick....
OpenCog architecture requires very fast read/write operations with small data,
2) NoSQL solutions are fast because they are are eventually consistent,
meaning that the data isn't guaranteed to be consistent at any given moment. So as the agents discover more facts about the world (adding atoms and links) they will discover redundant data (because they don't know that it has been discovered by their fellow agents).
As a consequence the info/data ratio will be small
resulting in slower (even wrong?!) inferences,
3) Although key-value pairs seems like the right choice for graph representations, they are not.
l satisfy all of the needs of the OpenCog data. But if we combine them, we can definitely address all of its needs. OpenCog is already polyglot
As I hate to just nag, I did some digging around and found out about Graph databases and NewSQL. They are not general purpose dbs, they each specialize in one field. Graph databases, as the name suggests, specifically store and retrieves graphs the way it should be done.
I read somewhere that a graph db is 1000x faster than a conventional db when the content can be best represented with a graph (although that seems a little exaggerated).
VoltDb is the leading force in the In-memory databases
I think the records of the MindAgents' actions should be stored as
Atoms, and thus either...
As a consequence the info/data ratio will be smallHuh? Sensors usually provide too much data ...
VoltDb is the leading force in the In-memory databasesWell, you started out by claiming that things don't fit in-memory! If you want an in-memory database, hell, gnucash has one that was built about 13 years ago. It can do any kind of search and query. Really cool. We should have split it out as a distinct project; the NewSQL revolution could have started a decade earlier!
Here are my major complaints:1) Opencog already has persistence. Absolutely no one is using it.
2) The existing opencog persistence is distributed, and should scale just fine to maybe 10 or 20 or 100 machines. It works; I once ran it on 3 machines.Until people start using this, and start discovering ways in which it is inadequate, I think that having discussions about persistence are ridiculous and absurd.
So we have not even begun to scratch the surface of what can be done. I think its completely premature to start talking about which technology will be used for the solution, before we have outlined what the problem is, and how to solve it.
--
o, for example, the space/time parts of embodiment have already decided that the plain-old atomspace is not good enough for 3D data, and there are new, distinct space/time servers created just for this. These servers have a very different set of requirements for persistence.
--
@linasI can't possibly answer all of these questions correctly right now, but I'll try.
Questions 1,2,3,4,5,6: If we use a hypergraphDB (HGDB), for example HyperGraphDB, the data will be stored somehow
and will be represented to us like a graph.
The retrieval of the data may be slow in HGDB,
so we store the the atoms and links in HGDB, but use a KVP to store the metadata,
Resulting in fast travel in the graph nodes and fast retrieval of atom data.
All we have to do is synchronizing the atomic operations of saving,
Something we will be doing anyways no matter what technology we use.
Question 8,9,10: These questions are specific to relational model dbs,
except the locks and their mechanism (which I can't answer right now).
In my point of view the ten above questions can be reduced to:
by Erik Meijer, Gavin Bierman | March 18, 2011
Questions 1,2,3,4,5,6: If we use a hypergraphDB (HGDB), for example HyperGraphDB, the data will be stored somehowHow?and will be represented to us like a graph.How?
A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index lookups are necessary.
The retrieval of the data may be slow in HGDB,Why would it be slow?
so we store the the atoms and links in HGDB, but use a KVP to store the metadata,What metadata? The atomspace has metadata ??
Resulting in fast travel in the graph nodes and fast retrieval of atom data.Two sentences ago, you said it was slow; now you say its fast? Which is it?
All we have to do is synchronizing the atomic operations of saving,Why?
How?
Something we will be doing anyways no matter what technology we use.Doing what?
Question 8,9,10: These questions are specific to relational model dbs,No, they're not; I think you misunderstood the questions.
except the locks and their mechanism (which I can't answer right now).Why are locks needed? The current sql persistence backend does not need locks ...
In my point of view the ten above questions can be reduced to:Again -- spend at least a few days or maybe a LOT longer, before you try to answer these questions. Do some research. Google stuff out when you get stuck. Sketch out on a piece of paper how you think you will organize things; what points where, how; the names of all the table that you'll need, what each table will contain.
These were not meant to be easy questions, and if you think they are easy, then you didn't really understand the questions.
the list of Atom types will
probably ultimately be dynamic.
For instance, let's say we had an AtomSpace with a very large number
of relations such as
EvaluationLink
CommunicatesWith
A
B
because the system has been fed a lot of data from repositories of
email and mobile phone
communications [just a random example off the top of my head]
Then we may want to have a CommunicatesWithLink, to save memory, and
simplify the
encoding of rules regarding communication in the system...
If the system automagically identifies that CommunicatesWithLink
should be added as a new
link type, you'd like this change to be able to happen, without some
sort of wholesale
restructuring of the knowledge store
But the point is, it would be *better* if the list of Atom types were
not hard-wired in,
> 2) Are links and nodes stored in the same table, or in different tables?One comment is that, ince we can have links pointing to links, the
link/node distinction is not really that fundamental...
This is an argument in favor of "same table", but not a decisive one
of course...
The current attention allocation subsystem uses a big matrix of
> 3) Are truth values stored with the atoms, or are they in a separate table?
> Same question for attention values?
attention values for carrying out the mechanics of
importance spreading.... This suggests that, in RAM, it can
sometimes be useful to store AVs separately from Atoms...
In the case of persistent storage, though, I can't currently think of
an important case where you'd need to grab a TV or AV
without also grabbing the Atom.....
Of course, if you're grabbing a
network of Atoms from disk, then at the edges of the network,
you may be grabbing the TV, AV and other basic information of certain
Atoms and not grabbing their links. But this case doesn't really
justify having a separate store for TVs or AVs...
> 4) Should the outgoing set of a link be considered to be part of a link> type, or not? Should links with different outgoing types be stored in aOften there are multiple ways to represent the same thing, and the
> different table, or the same table? That is, if I have
>
> SimilarityLink
> ConceptNode A
> ConceptNode B
>
> and
>
> SimilarityLink
> ConceptNode A
> NumberNode B
>
> should these be stored in the same table (the table of all similarity links)
> or in two different tables?
choice between them is somewhat arbitrary
Different queries will tend to filter the outgoing set in different
> 5) Does the outgoing set get stored as one single record, or is it split
> across many records? One table, or many tables?
ways, and there seems no way to split it up across many records in a
way that will make sense for the variety of common queries.
So, if
it's split up, one has to assume that a significant percentage of
queries will need to be executed across multiple parts...
In that case, there should be a way for the system to increase its
confidence of having
a complete version of the ste of incoming links toward certainty, via
exerting extra
effort/resources
If you really really try, you can remember *almost* everything you
know about your
first girlfriend, even though you may never get to completeness...
About the requirements for a distributed Atomspace, I did think this
> The above is not an exhaustive list of questions; there are more. We have
> not yet even talked about "distributed" and "synchronization" and
> "consistency" and "concurrency" and "write-back". These are even harder
> questions, best left for later.
through pretty
carefully about a year ago ... those who don't remember the
particulars of that long
discussion can look at the document here:
http://wiki.opencog.org/w/DistributedAtomspace
I can envision use cases where a KVP persistence layer with separate
instances for each atom attribute outperforms an RDB-style system were
all atom data is lumped into single records or views, because of
parallelism. Think of it like RAID striping; if each atom attribute is
stored in a separate storage instance, a single-atom fetch may be slower
than the RDB case, but a stream of them will likely fetch faster.
Probably shoehorning a graph database into a bunch of KVPs isn't optimal
but it's probably also much less effort than writing a more 'native'
persisting graph database from scratch (whatever 'native' means with
respect to implementing them on on digital computers),
and it can take
advantage of other KVP library features, like the memory-mapped
zero-copy lookup found in LMDB.
That one feature alone, combined with solid state storage, raises some
fundamental design questions, like does it make sense to continue to
mostly use a query + batch-load + batch-process model?
When would
pattern matching and graph transformation operating directly on stored
atoms make sense over query-load-process?
A native HyperGraphDB storage could follow the idea of identifiers
(i.e. handles in OpenCog hypergraph lingo) being disk offsets
and, to
solve the distribution issue, it would split an ID into two parts: a
long identifying the origin (the DB instance where the atom was
created) and a long which is a local disk offset.
Each node in a HGDB
cluster would then have a separate file for each peer it replicates
atoms for. So all atoms originating at a given peer instance will be
stored in a dedicated file with the local offsets somehow mapped from
one peer to another (so that the file size doesn't have to
replicated).
I also believe that there is no commercial or open source db project that will satisfy all of the needs of the OpenCog data. But if we combine them, we can definitely address all of its needs.
> Ramin,
>
> I don't know what to say. I don't know how to respond to what you wrote. I'm sorry that I upset you; I think you need to get some rest, take a vacation. Lets just take a break, and resume this conversation in a week.
Hmmm, Linas,
I think Ramin has a valid point that using a graph db eliminates the need for thinking about the design choices you've mentioned (since they're mostly concerned with relational representations). I also tend to think that graph dbs would be a better choice for persistence, maintain-wise.
However, for a new AtomSpace design, I think, the question would be: are there any sufficiently good in-RAM graph db implementations out there which could feasibly address the requirements that Ben talks about in his proposal?
Cheers,
--K
If I understand the requirements correctly as outlined in Ben's design
sketch, the system needs to accommodate AtomSpaces much larger than
available physical RAM
and be able to query+copy sets of atoms between
nodes, and both of these things could be accomplished using your
persistence backend with some additional work, such as a well-tuned
ForgettingAgent
and a new TelepathyAgent that uses remote connections to
atomspaces and postregess instances.
With that said, reducing explicit memory management (i.e. reducing
complexity & overheads of Forgetting)
(for certain kinds of nodes). Of course that assertion can be validated
only building it and measuring it.
I think that's a good approach to the stable/development branches, but
> Philosophically, I am an evolutionary incrementalist. This means that
> I usually prefer small, incremental steps that change the codebase in
> small steps, rather than one big revolutionary, boil-the-ocean
> approaches. So I would propose this:
>
> -- Who are the primary users of the atomspace today?
> -- What performance bottlenecks are they experiencing? (I would
> prefer hard numbers rather than vague assertions and accusations)
> -- What can we do to fix those bottlenecks? (viz, what can we do with
> a person-week or a person-month effort? I don't want a 5 man-year
> project to redesign everything.)
we're also discussing potential experimental branches where a 5-man year
redesign might be the most sensible answer.
If the atomspace used memory-mapped storage semantics for all memory
allocation/deallocation, pattern matching could operate on in-RAM atoms
and stored atoms in unison.
The parts of stored atoms subject to the
pattern matching (assuming it's expressed as a series of masks) would,
at least temporary, be fetched into RAM (or perhaps just a register).
note, here I'm thinking of 'storage' as PCI-attached flash that
completely bypasses all block and filesystem semantics.
Segregated atom
properties stored across physical storage devices may operate as a type
of striping. I'll guess that things like CPU-cache lines aren't a
factor, since the pattern match events are likely to be one-offs or
separated by relatively large intervals in time.
Streteching a bit
further into optimization speculation, it should be possible to flag
fetches for data for these ephemeral pattern matches as do-not-cache,
keeping cache lines usable for other purposes.
This also got me thinking about indexes which are, in the context of
> -- 'graph transformation operating directly on stored atoms': if this
> is what I think it is, then we don't have graph transformations at
> all, yet, but it would make sense to put it inside the atomspace, not
> outside, since that would have much higher performance.
hypergraph semantics, just the results of pattern matches that are
beneficial to keep around, perhaps because the pattern match is repeated
frequently, or because it's expensive.
In a system with dynamic
indexing, each pattern match expression (graphs themselves) could be
optionally stored for statistical analysis for building and maintaining
a list of indexes to keep.
The indexes themselves, if kept as memory
mapped structures, could be huge and swapped out to storage when not in
use
(this assumes the case where creating the index is much more
expensive than simply fetching it, in whole or in part, from storage).
Of course there's still the overhead involved in updating indexes at
each atom add/change/remove,
but such a system makes it generalizable
rather than hard-coded for each index to be updated.
My proposal was to use each technology where it best performs. After some research on the matter, I've reached the following conclusions (summary):1) RDBs should be used where there is a need for ad-hoc queries or when complex data needs to be aggregated, e.g. summation, average, etc.2) If it's a simple structured data, best use KVPs. They're easier to query and needs far less design/develop time than relational models.3) GDBs are best used where networks need to be queried, for example "How can I get from X to Y?" or "Which nodes are most strongly related, but not directly connected to X?"4) As of right now, OpenCog doesn't have an immediate need for a complete and perfect persistence model, the current model is functional and ready. But it's not clear to me whether a "virtualization layer" exists somewhere in the codebase which one could use without heavily coupling with the persistence methodology.
MapReduce ... but
anything requiring traversal of atoms kind of breaks the paradigm...
On the other hand, general map and reduce functions in the OpenCog
codebase would be useful. I vaguely recall these already exist though,
at least in the Scheme bindings
and a new TelepathyAgent that uses remote connections to
atomspaces and postregess instances.
?? The current backend connects just fine to remote instances, no agent needed. If one agent wants something in some remote atomspace, it just asks the backend about it. **(see footnote)
With that said, reducing explicit memory management (i.e. reducing
complexity & overheads of Forgetting)
Forgetting seems to be an integral part of inferencing; attention values are an explicit part of the atomspace architecture. You can't get rid of explicit memory management until you get rid of explicit attention values.
note, here I'm thinking of 'storage' as PCI-attached flash that completely bypasses all block and filesystem semantics.
Well, you can't, not without inventing a new kernel extension that would get laughed off of the linux kernel mailing list.
To put it more kindly, the overhead of using block and filesystem semantics in your database is maybe at most 0.2% of performance, and probably closer to 0.0001% so you gain nothing by trying to avoid it.
Streteching a bit
further into optimization speculation, it should be possible to flag
fetches for data for these ephemeral pattern matches as do-not-cache,
keeping cache lines usable for other purposes.
What other purposes are there? What could we possibly be storing in RAM that are not atoms?
The indexes themselves, if kept as memory
mapped structures, could be huge and swapped out to storage when not in
use
Maybe. I imagine that indexes take up 1% or maybe 10% at most 20% of what the raw data takes. It's a question I'm willing to punt to the future.
Hi Boris,Some casual thinking-aloud below.On 15 July 2013 17:38, Borislav Iordanov <borislav...@gmail.com> wrote:
A native HyperGraphDB storage could follow the idea of identifiers
(i.e. handles in OpenCog hypergraph lingo) being disk offsetsHmm. Interesting idea. But offsets are kind-of-like pointers, and all of the comp-sci issues with pointers show up with offsets as well, right? So either you have to alloc/free the offset, and make sure no one is using it when you free it, or you use reference counting, or you garbage collect unused disk offsets.
By contrast, ID's that are not offsets have some but not all of these problems. So, for example, you can't free the storage associated with an ID until you are sure no one is using it. But you are not under any pressure to re-use an ID, since you don't waste any space if an ID never used again.
Finding out if an ID is used is "easy" for a user of a relational DB: you just ask it "is it being used?" and you get an answer. It's someone else's problem (viz the DB author's) to make this actually happen. But if we go native, then we have to solve this alloc/free/refcnt/gc mess ourselves.
I noticed long ago that there are people out there who really, really want to write their own OS kernel, damn the rest of the world. Then there are people who roll their own compilers and languages. I suppose there are people who really really want to roll their own DB (are you one such?). We
need to find one of these people, and get them to make one for opencog. Maybe what we really need to do is to get on DB discussion forums and start pleading and advertising and begging and hyping and trying to capture someone's imagination. How to do this?
and, to
solve the distribution issue, it would split an ID into two parts: a
long identifying the origin (the DB instance where the atom was
created) and a long which is a local disk offset.The current opencog sql persistence backend solves this in a very simple, but I think clever way. Basically, each instance requests a block of ID's, which it can issue however it wants. When these run out, it just asks for more. There are no ID collisions, and I don't have to issue them in blocks of 2^32 or 2^64, but can issue tiny block of 1M each. Issuing is trivial: increment some counter by += 1M (or whatever). If one instance needs more than 2^64 ID's, that's also not a problem. And since ID's never have to be returned or freed, that's it. Real easy.
Each node in a HGDB
cluster would then have a separate file for each peer it replicates
atoms for. So all atoms originating at a given peer instance will be
stored in a dedicated file with the local offsets somehow mapped from
one peer to another (so that the file size doesn't have to
replicated).Replica management seems like it can become very non-trivial. So again, I'm wondering: can we convince some DB geek(s) out there to invent a brand new hypergraph DB?
Now, I am utterly clueless as to how to think about your HGDB. How hard would it be to map opencog atoms into it?
How hard would it be to do this from C++?
What might the performance be?
Well, I guess that's how I got hooked into doing HyperGraphDB, thanks to David. Except I made the mistake of starting it in Java, thinking it would be just a quick way to learn more about the ideas and that I would not go too far before switching to C++. Then a couple of years ago, I almost jumped into a C++ rewrite for Opencog, but realized that this would (well, should, at least) involve a redesign of Opencog's atomspace, so it would have been too intrusive and it would have met a lot of resistance from the core Opencog developers. So I backed off.
If you remember, we had a discussion where I was arguing that attention allocation was analogous to garbage collection
and that handles should be implemented as smart pointers etc.
Anyway, I'm only bringing this up because I think the decision of how much long-term storage management and storage concerns have to be part of Opencog internals is also important in deciding whether you want a dedicated db implementation, or what SQL or NoSQL you want to use.
For example, where are the transaction boundaries? In general, how transparent are db operations to Opencog API users? Or, how much storage and/or data distribution algorithms would be driven, or modulated to some extent, by higher-level processes within Opencog?
If Opencog doesn't have such a simple API, I'm sure it won't be that
hard to create and it will probably help people a lot.
But in that metaphor, we have a teleporter, which gives us the ability
to hop directly to any given location based on its GPS coordinates....
So, given this, we don't really need street signs, all we need are
signs with (landmark name, GPS coordinate) pairs, placed where people
potentialy interested in the landmark will see them...
;)
I'm sorry, I'm not sure I understand the type of indexing you are talking about.During traversal, at each step you have a bunch of choices what the next atom would be. So you want some sort of index that helps you make that choice given a known end-goal? If I'm understanding correctly, that's an interesting problem...
Hi Joel,
Surely python has these too. Don't be fooled: they are only map-reduce in the abstract comp-sci sense, not in the data+storage-management sense.
When I say MapReduce here, what I'm really thinking is "some system that will fetch large blocks of data from some distributed store, allow me to perform work on those blocks, and then write something back". For opencog, I know the data is "a bunch of atoms" and the work to be performed is "traverse them". By "traverse them" I mean "run PLN on them" or "run pattern matching on them" or run some other agent on them. Lets pretend we can throw away the current atomspace API completely. Lets pretend each traversal agent was single-threaded, and never needed to touch any atoms outside of its little universe that were just fetched. (Or maybe something like the microsoft facets paper that Chris Beyer posted earlier on this thread). How could all of this be made to work?
--linas
> You mentioned "parallel query" in HGDB. What sort of queries do youQueries are set-oriented: given an expression with some criteria, find
> support? Any enlightening comments/lessons?
all atoms that match the criteria. The criteria lets you specify
constraints on:
- the type of an atom, e.g. it's of type T or it's of type T or any of
its sub-types
- the value of an atom, e.g. "equal to x" or "having a property p equal to y"
- the way it's linked to other atoms, e.g. "it's a link pointing to X
and Y" or it's the target of link L at position n
And you can combine those with the logical and, or, not operators.
can be used in the set-oriented queries. For instance, to avoid all
atoms of type Foo that are connected to atom A, you could do:
graph.findAll(hg.and(hg.bfs(A), hg.type(Foo));
Those query expressions go
through a "compilation" phase to construct a query plan.
The one parallelization that I've done so far is on logical
disjunctions. Often there are queries of the form:
and(link(x,y), or(type(Foo), type(Bar)))
That is..give me all links that point to both x and y and that are
either of type Foo or type Bar. Before execution, this becomes:
or(and(type(Foo), incident(x), incident(y)), and(type(Bar),
incident(x), incident(y)))
If Opencog doesn't have such a simple API, I'm sure it won't be that
hard to create and it will probably help people a lot.
I'm sorry, I'm not sure I understand the type of indexing you are talking about.During traversal, at each step you have a bunch of choices what the next atom would be. So you want some sort of index that helps you make that choice given a known end-goal? If I'm understanding correctly, that's an interesting problem...Suppose the following is a sub-graph of a much bigger graph:a --> b --> c --> dNow the question we want to answer is: "Is there a relation between d and a?". The question is analogous to: "Is there a path between d and a?". We will need a heuristic to find the path itself, but finding the actual path is expensive. Can an index made in a way that will:-- tell us if there is a path between d and a (not the path itself)
For Opencog, it seems to me that the hard part is having a working query engine
Well, yes.Look, in the other discussion thread, given what you are doing, I was saying that you should just brute-force search the above: its just three nested for-loops in C++, maybe a few dozen lines of code; its just not hard to code up.
However, if you are certain that you will be needing to do this a lot, with rather specific and fixed a and d, then we could create a "user-defined index" for this.
This is similar to the discussion I had with Linas and Ben a while ago, a mechanism for categorizing the atoms. It's not a "Concept", it's similar to the user-defined indexes (if not the same). These spaces can be created on the fly. The user defines a condition and an agent fills the space with the appropriate atoms. The AttentionalFocus is a special case of a Sub Space. So for example, a feasible mechanism for implementing this IMO would be tags. An Atom can have the tag "Rule", so when one wants to fetch all the rules, she/he would just query all the atoms with this tag.
An event is something that someone can hook to and apply a piece of code based on the specific changes that happen to atoms. I wonder whether a similar functionality is already present in the atomspace, but as per previous discussions, it would be useful to have a fully functional/well-documented feature ready for OpenCog developers.This should be something that is available in all the application layers (e.g. python, scm, c++, etc.)
It would be nice to have a custom KVP store for agents so they can save the data needed for their processes and share them if needed
--
- Sub SpacesThis is similar to the discussion I had with Linas and Ben a while ago, a mechanism for categorizing the atoms. It's not a "Concept", it's similar to the user-defined indexes (if not the same). These spaces can be created on the fly. The user defines a condition and an agent fills the space with the appropriate atoms. The AttentionalFocus is a special case of a Sub Space. So for example, a feasible mechanism for implementing this IMO would be tags. An Atom can have the tag "Rule", so when one wants to fetch all the rules, she/he would just query all the atoms with this tag.
- EventsAn event is something that someone can hook to and apply a piece of code based on the specific changes that happen to atoms. I wonder whether a similar functionality is already present in the atomspace, but as per previous discussions, it would be useful to have a fully functional/well-documented feature ready for OpenCog developers.This should be something that is available in all the application layers (e.g. python, scm, c++, etc.)
- Agent's DepositoriesIt would be nice to have a custom KVP store for agents so they can save the data needed for their processes and share them if needed
There are addAtomSignal() and removeAtomSignal() and mergeAtomSignal() there's nothing for changing TV, AV, presumably due to laziness or lack of demand.
No, I think that if an agent needs to save or encode something, it should do so as a hypergraph. So for example,
ListLinkAnchorNode "MyAgent Special SaveStuff"Concept Node "whoopee!"Number Node 42etc.
1. In the event of a link being added, the atoms in the outgoing set is put into a queue, wake the agent up if needed.2. FIA does all the possible inferences on the first atom in the queue, adding new links if needed (which in turn by firing the"a link is added" event, populates the queue with new atoms).3. Repeat the procedure until the queue is empty (I don't mean for the FIA to empty the queue in one cog loop).4. If the queue emptied, the FIA goes to an "idle" state5. If the FIA stays in the "idle" state for too long, put it to sleep.6. If possible, release the resources that the sleep agent had used.
On Jul 31, 2013 1:35 PM, "Ramin Barati" <rek...@gmail.com> wrote:
>
> Hi,
>
>> There are addAtomSignal() and removeAtomSignal() and mergeAtomSignal() there's nothing for changing TV, AV, presumably due to laziness or lack of demand.
>
>
> Are these interfaces exposed to python/cython or scm?
I did not implement this for python.
J
Hi,There are addAtomSignal() and removeAtomSignal() and mergeAtomSignal() there's nothing for changing TV, AV, presumably due to laziness or lack of demand.Are these interfaces exposed to python/cython or scm?
No, I think that if an agent needs to save or encode something, it should do so as a hypergraph. So for example,
ListLinkAnchorNode "MyAgent Special SaveStuff"Concept Node "whoopee!"Number Node 42etc.You're right, it would be better to save these things as hyper-graphs but, as far as I know, ListLinks are static. Are there any other alternatives (which are dynamic)?
I've a design in mind for a PLN ForwardInference Agent (FIA) which incrementally does forward chaining as atoms get added to atomspace. And for that, I need those three features to be present (or simulated) in atomspace. Here is the algorithm I've in mind:1. In the event of a link being added, the atoms in the outgoing set is put into a queue, wake the agent up if needed.2. FIA does all the possible inferences on the first atom in the queue, adding new links if needed (which in turn by firing the"a link is added" event, populates the queue with new atoms).3. Repeat the procedure until the queue is empty (I don't mean for the FIA to empty the queue in one cog loop).4. If the queue emptied, the FIA goes to an "idle" state5. If the FIA stays in the "idle" state for too long, put it to sleep.
6. If possible, release the resources that the sleep agent had used.Sub Spaces are needed for storing the rules,
Events for the events of course, and Agent's depositories for storing/persisting the queue somewhere,
In case sth goes wrong, e.g. power outage. The queue's length isn't static so FIA needs a depository that is able to grow and shrink.
However, I can see your attraction to playing with an FIA that doesn't
rely on attention allocation, just to simplify your initial task....
So for that reason, I guess it's OK for you to build a MindAgent along
the lines you describe ;)
However, it's not a substitute for an FIA that selects Atoms based on
STI and reasons on them...
In general an FIA that's based on STI is a better idea, because it's
much more general purpose.... As new Atoms come into the Atomspace,
they should be assigned high STI in most cases; so an STI-based FIA
would carry out what your new-ATom based FIA does anyway...
However, it's not a substitute for an FIA that selects Atoms based on
STI and reasons on them...
In general an FIA that's based on STI is a better idea, because it's
much more general purpose.... As new Atoms come into the Atomspace,
they should be assigned high STI in most cases; so an STI-based FIA
would carry out what your new-ATom based FIA does anyway...Correct me if I'm wrong, but trying to infer on an atom that is not in the queue would be fruitless. Inferring on an atom, regardless of how much it's in our attention, without having new info/links on the atom is just a waste of time. The queue can be easily sorted by the Atoms' STI, making it proportional to STI, and it doesn't need a "random draw function" to be implemented in the atomspace. Also incremental inference is cheaper than random draws IMHO. Sorting a list will take O(n*log(n)) time in the worst case scenario, and 'n' is almost independent of atomspace's size, it's more or less proportionate to the rate of discovery. OTOH choosing a random atom will put more stress on the atomspace's indexing services (fitting to the new STIs) and the agent will spend a significant portion of it's time on reasonings that wouldn't result in discovering sth new.Maybe my understanding of the FIA's job or attention focus is wrong, but I just can not see a reason why not to choose this approach over random draws.
However, it does not contain Atoms that have had their truth values
*changed* by some process (e.g. inference itself)...
If the queue consisted of everything that had had its truth value
changed, along with everything that had been newly created, then I'd
agree with you....
On Fri, Aug 2, 2013 at 5:37 PM, Ramin Barati <rek...@gmail.com> wrote:Hmmm... no, I guess it would always be the revision rule...
>> However, it does not contain Atoms that have had their truth values
>> *changed* by some process (e.g. inference itself)...
>
>
> If you mean changed by e.g. the revision rule, it would trigger the event.
> But if there's sth that would change the TVs without going through "adding
> atoms", it won't get in the queue.
I didn't realize that the revision rule would trigger an addAtom type
event...
On 2 August 2013 04:43, Ben Goertzel <b...@goertzel.org> wrote:On Fri, Aug 2, 2013 at 5:37 PM, Ramin Barati <rek...@gmail.com> wrote:Hmmm... no, I guess it would always be the revision rule...
>> However, it does not contain Atoms that have had their truth values
>> *changed* by some process (e.g. inference itself)...
>
>
> If you mean changed by e.g. the revision rule, it would trigger the event.
> But if there's sth that would change the TVs without going through "adding
> atoms", it won't get in the queue.
I didn't realize that the revision rule would trigger an addAtom type
event...Me neither. Why would this be? The whole atomspace is designed around the idea that adding and removing atoms is expensive/slow, while changing their TV and AV is cheap/fast.
Hi,
It occurred to me in the midst of another discussion on this list that "tags are a special case of hypergraphs".
So I'm now wondering why we don't have a node in the AtomSpace named "AttentionalFocus" and have atoms linked to it maybe via MemberLinks?
-- K
--
Keyvan Mir Mohammad Sadeghi
MSc AI
"One has to pay dearly for immortality; one has to die several times while one is still alive." -- Friedrich Nietzsche