Hi Gunnar!
>> If there's more interest now, I can try to prepare the code for inclusion in
>> rdflib, either as a replacement or as an alternative for the current memory
>> backend.
>
> That would be great!
OK, I'll give it a shot.
> The current store keeps complete indexes for all combinations of spo
> queries, the theory being that we trade higher memory requirements for
> faster query speed.
>
> It may be that for most queries the set-intersection in
>
http://code.google.com/p/skosify/source/browse/trunk/setstore.py#170
> is just as fast.
>
> I see you already optimised the "(s,p,o) in graph" where all s,p,o are
> bound a bit.
In my own tests back in the days, if there was any difference at all,
my store was a bit faster (between 5% and 20% AFAIR).
> I would be very interesting to see results for
> test/store_performance.py for this store.
I tried to run it (first with a vanilla clone of rdflib, not yet with
my store code), but I'm not sure the script is in working shape. First
I had to change the line
store = "Memory"
to
store = "IOMemory"
in order to even run the script. Then (when running "python
test/store_performace.py" - is this the right way to run it as it's
written as a unit test?) I get this output:
--cut--
IOMemory
input: 0.000213 random: 0.000171 .
.default
input: 0.000211 random: 0.000164 .
.
----------------------------------------------------------------------
Ran 2 tests in 0.851s
OK
--cut--
It seems to me that the script is fetching and parsing
http://eikeon.com for test data, but AFAICT there's not a lot of
triples to be had there (currently 2 it seems). Maybe as a result, the
timing values are ridiculously low, and they sometimes change a lot in
subsequent runs.
(there's also a typo in the filename of the script)
> Even if querying is slower, it would be nice to offer to choice of
> speed vs. memory.
Right.