Hi Jack,
Wow! I welcome the effort, its nice to have projects influence one-another.
I would like to point out something: to **understand** the atomspace, reading the source code is probably the hardest, most confusing, and misleading way of doing it (as in there lurks bugs and occasional bad design).
To understand the atomspace, its best to either run through the examples, and/or read through the wiki. That would be a much easier path.
Here's a thumbnail sketch. So -- the atomspace is first a graph database (and so, to recreate it in java, it might be easiest to start with some existing graph database)
Next, its a bunch of predefined types. Some of these are relations: for example, InheritanceLink is the classic "is-a" relation -- x is-a y. For the so-called "semantic triples", we use EvaluationLink -- x R y for R some arbitrary named relation. We call R a PredicateNode, so for example, for "Jack owns a computer", x=Jack, R=owns y=computer so (Evaluation (Predicate "owns) (List (Concept "Jack") (Concept "computer"))) In first-order logic, one writes P(x,y) instead of x R y, whence the name predicate.
Next, a conceptual leap: one is not limited to just P(x,y), but one can have arbitrary numbers of arguments. These arguments can be other atoms, which is what makes it a "graph database". And finally, there is no force-fit schema, which is why it's not SQL. (so, for example, "triple stores" have a force-fit schema: everything must be a triple, of the form x R y. In other words, a table with 3 columns. For the atomspace, "anything goes". Otherwise, it would be just SQL: since SQL is-a kind-of graph database, it just forces you to pre-declare your schema, i.e. to use tables.)
Next, each predicate (more generally, each atom) has a truth-value. Classically, this is true/false (e.g. "it is true that Jack owns a computer"). The next conceptual leap is this: crisp true/false -> probability -> probability+confidence -> list-of-floats -> arbitrary json struct -> arbitrary key-value-db ->arbitrary key-value-db with time-dependent values.
So, in this example, "Jack owns a computer" has an associated key-value DB on it. One of the keys might hold the truthiness of this statement. Another key might hold its probability. Another key might hold the time-varying value of the physical distance between Jack and the computer, or maybe the pixel-values on the screen at this instant in time. These are called "Values"
So, you can imagine the AtomSpace as holding graphs -- those graphs re like pipes, plumbing. The Values are the water that flows through the pipes. Performance-wise, its fairly hard/slow to change the graphs, but the values can change constantly. The pipes have a query language. The values do not (because key-value databases don't have a query language, by definition.)
Finally, there are three or four more magic ingredients:
-- Some atoms are executable. For example, PlusLink knows how to actually add numbers together. PlusLink is backed by a C++ class that performs addition.
-- Queries from the query language are graphs themselves. So queries can be stored in the Atomsapce (this is very unlike SQL, where you cannot store a query in the database itself. I think this is also very unlike any typical graph DB.)
-- A relation P(x,y), together with it's truth-value, can be thought of as a matrix. So there is an API to access P(x,y) as if it was an actual matrix, doing typical matrix-math stuff to it.
Well, there are a few more tricks up it's sleeve, but this email is too long already.
I'm hoping that this gives you a flavor for what to shoot for. Grokking this email is surely easier than reading the source code :-)
-- Linas