I'm reading "Programming the Semantic Web" by Segaran, Evans, and Tayor.
It's about the Semantic Web BUT it uses python to build a "toy" triple
store claimed to have good performance in the "tens of thousands" of
triples.
Just in case anybody doesnt know what an RDF triple is (not that it
matters for my question) think of it as an ordered 3 tuple representing
a Subject, a Predicate, and an Object eg: (John, loves, Mary) (Mary,
has-a, lamb) {theSky, has-color,blue}
To build the triple store entirely in Python, the authors recommend
using the Python hash. Three hashes actually (I get that. You
want to have a hash with the major index being the Subject in one hash,
the Predicate in another hash, or the Object for the third hash)
He creates a class SimpleGraph which initializes itself by setting the
three hashes names _spo, _pos, and _osp thus
class SimpleGraph;
def __init__(self);
self._spo={};
self._pos=();
self._osp={};
So far so good. I get the convention with the double underbars for the
initializer but
Q1: Not the main question but while I'm here....I'm a little fuzzy on
the convention about the use of the single underbar in the definition of
the hashes. Id the idea to "underbar" all objects and methods that
belong to the class? Why do that?
But now the good stuff:
Our authors define the hashes thus: (showing only one of the three
hashes because they're all the same idea)
self._pos = {predicate:{object:set( [subject] ) }}
Q2: Wha? Two surprises ...
1) Why not {predicate:{object:subject}} i.e.
pos[predicate][object]=subject....why the set( [object] ) construct?
putting the object into a list and turning the list into a set to be the
"value" part of a name:value pair. Why not just use the naked subject
for the value?
2) Why not something like pos[predicate][object][subject] = 1
.....or any constant. The idea being to create the set of three indexes.
If the triple exists in the hash, its "in" your tripple store. If not,
then there's no such triple.
In [1]: set(1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/brucewayne/<ipython console> in <module>()
TypeError: 'int' object is not iterable
> 2) Why not something like pos[predicate][object][subject] = 1
> .....or any constant. The idea being to create the set of three indexes.
> If the triple exists in the hash, its "in" your tripple store. If not,
> then there's no such triple.
>
I can't really answer that, I imagine there is a better way to code what is
trying to be accomplished. But I'm no Steven D'Aprano and I'm already a few
beers in ;)
self._pos[predicate][object].add(subject)
> 2) Why not something like pos[predicate][object][subject] = 1 .....or
> any constant. The idea being to create the set of three indexes. If the
> triple exists in the hash, its "in" your tripple store. If not, then
> there's no such triple.
>
Because it's less efficient. Since there will only ever be one unique
occurrence of each (predicate, object, subject) triple using a dict
would be unnecessarily wasteful. Containment checks for sets are just as
fast as for dicts, asn you don't need to store all those references to 1.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/
I had forgotton that once you have a "Cell" in a 2D matrix to represent
the first two components of the 3 tuple, you then want multiple values
IN the cell for the possibly multi valued third component.