--
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To post to this group, send email to rdfli...@googlegroups.com.
To unsubscribe from this group, send email to rdflib-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rdflib-dev?hl=en.
I've not yet had a chance to look at your commits yet, but great to see you're getting the ball rolling to improve the documentation. I have a Hudson instance that is now complaining the test are no longer all passing :( It looks like the revision that broke some of the tests was http://code.google.com/p/rdflib/source/detail?r=1886 . And it looks like a lot of the tests are failing with the following error:
======================================================================
ERROR: test.test_n3test.test_all_n3_serialize('test/n3/n3-writer-test-22.n3', 'n3')
----------------------------------------------------------------------
Traceback (most recent call last):
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib-virtualenv/lib/python2.6/site-packages/nose-0.11.4-py2.6.egg/nose/case.py", line 186, in runTest
self.test(*self.arg)
File "/var/lib/hudson/jobs/rdflib/workspace/test/test_n3test.py", line 30, in check_n3_serialize
g = ConjunctiveGraph()
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/graph.py", line 864, in __init__
super(ConjunctiveGraph, self).__init__(store)
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/graph.py", line 269, in __init__
self.__store = store = plugin.get(store, Store)()
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/plugin.py", line 92, in get
return p.getClass()
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/plugin.py", line 55, in getClass
module = __import__(self.module_path, globals(), locals(), [""])
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/plugins/memory.py", line 91
self.__contexts()
^
IndentationError: unexpected indent
and a bunch with:
----------------------------------------------------------------------
File "/var/lib/hudson/jobs/rdflib/workspace/rdflib/collection.py", line 134, in rdflib.collection.Collection.__delitem__
Failed example:
g.add((c,RDF.first,RDFS.comment))
Exception raised:
Traceback (most recent call last):
File "/usr/lib/python2.6/doctest.py", line 1248, in __run
compileflags, 1) in test.globs
File "<doctest rdflib.collection.Collection.__delitem__[10]>", line 1, in <module>
g.add((c,RDF.first,RDFS.comment))
NameError: name 'g' is not defined
I'd point you directly at the hudson server if it wasn't behind a firewall :( I'll take a closer look tomorrow at the errors, but wanted to give you a heads up that the tests are breaking with some of your changes :|
On 6 Aug 2010, at 05:01, Daniel Krech wrote:
> I have a Hudson instance that is now complaining the test are no
> longer all passing
I too now have a Hudson instance and it is tracking changes on the
rdflib repos. The last build report is here:
http://bel-epa.com/hudson/job/RDFlib/7/console
I run it with "--with-doctest" enabled and a couple of doctest
failures are reported:
> ======================================================================
> FAIL: Doctest: rdflib.util.parse_date_time
> ----------------------------------------------------------------------
> File "/.../rdflib/util.py", line 108, in rdflib.util.parse_date_time
> Failed example:
> parse_date_time('1970-01-01T00:00:01Z') - 1.0
> Expected:
> 0.0
> Got:
> -3600.0
> ----------------------------------------------------------------------
> File "/.../rdflib/util.py", line 111, in rdflib.util.parse_date_time
> Failed example:
> parse_date_time('1970-01-01T00:00:00Z') - 0.0
> Expected:
> 0.0
> Got:
> -3600.0
> ----------------------------------------------------------------------
> Ran 286 tests in 33.090s
>
> FAILED (failures=1)
> Running nose with: --attr= test rdflib --where=./ --with-doctest
>
> --doctest-extension=.doctest --doctest-tests
The failures are reproducible outside Hudson's build environment (for
me, anyway) using rdflib 3.0.0 on OS X 10.5 :
from rdflib.util import parse_date_time
x = parse_date_time('1970-01-01T00:00:01Z') - 1.0
assert x == 0.0, "Expected 0.00, got %s" % x
y = parse_date_time('1970-01-01T00:00:00Z') - 0.0
assert y == 0.0, "Expected 0.00, got %s" % y
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx3zEYACgkQOsmLt1Nhivz9XgCfYibUVlwQGe9adt1pgDZF4euY
0twAninxx+iKOKh7/+kSnMvivLb/04LaiQCVAgUBTHfMRlnrWVZ7aXD1AQISfAP7
BcWufJUxp2GGqqmyfdp8R/aNese5ASPR9l7yknYXOkYDqc4au/OS75UnK872Qh5A
LM1ceyq8tBtPUlqUmqG2glhBDJlY/lkIb0q9r2ULx4tYxoaWma2AaVQrl3ACjUcF
63N7AioaUFfcyJ2UrDpoeHjF3X6bsDKGD0JiKytQVW0=
=3DjT
-----END PGP SIGNATURE-----
From the fact that this is an even hour and knowing
that you and I are UTC+1 right now, I would suspect
something to do with the handling of the local time
zone. Funny that this shows up now, util.py hasn't
changed since February and we haven't gone on (or
off, I can never remember) daylight savings time since
the spring...
Out of curiosity, what does
from time import timezone, daylight
print timezone, daylight
show for you? (for me, I get "0 1") I also notice that
parse_date_time doesn't use the "daylight" flag...
In fact,
>>> mktime((1970, 1, 1, 0, 0, 0, 0, 0, 0))
-3600.0
and rather strangely,
>>> mktime((1970, 1, 1, 0, 0, 0, 0, 0, daylight))
-7200.0
-w
--
William Waites <wwa...@gmail.com>
Mob: +44 789 798 9965
Fax: +44 131 464 4948
On 27 Aug 2010, at 15:31, Graham Higgins wrote:
> I too now have a Hudson instance and it is tracking changes on the
> rdflib repos.
My hudson instance is also tracking changes on three rdflib-related
repos that I keep on bitbucket:
* http://bitbucket.org/gjhiggins/rdflib (convenience repos for
experiments)
* http://bitbucket.org/gjhiggins/rdfextras (sparql and store support)
* http://bitbucket.org/gjhiggins/fuxi-2010 (FuXi with rdflib 3.0.0)
This is the first of three posts reporting work-in-progress for each
strand:
"rdfextras"
===========
This is a sandbox for work supporting the SPARQL packages and the
storage back-ends that were recently expelled from the Garden of Rdflib.
"rdfextras.ccfsparql"
=====================
I have migrated in and refactored Chime's (now-)non-C SPARQL
implementation, initially giving it a temporary package name of
"bisonsparql".
This package has now been refactored to "ccfsparql" as a feeble
attempt at providing some recognition of the supporting org.
I have also refactored (many of) the existing SPARQL tests and happily
was able to use the same test suite for both the default sparql
package and the ccfsparql package, so direct comparison is supported.
The build report and test results are here:
http://bel-epa.com/hudson/job/rdfextras/24/console
In summary:
The rdfextras default SPARQL package shows 2 tests failing ---
testFilterBNode and toldBNode.
The ccfsparql package shows 4 tests failing: the same two BNode tests
as above, plus test_secondary_recursion and test_simple_recursion.
"rdfextras.store"
=================
As regards the stores, only Sleepycat and ZODB store pass all their
tests, the other stores show a number of problems. I have a test suite
for the stores, I arrange for hudson to run that test suite over the w/
e and will advise the build URL in a subsequent post.
I am examining the utility of adapting the db2api approach taken in
the elderly _sqlobject store and applying it to SQLAlchemy instead.
I (like Nanny Ogg) am an inveterate picker-up of discarded trifles and
have found a half-completed attempt to provide an rdflib store
implementation for TokyoCabinet, which I've also added to the mix.
Note, the above two additions have not yet been committed to my
rdfextras clone.
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx34ioACgkQOsmLt1Nhivxx6gCfa62ZAAn6QCo02iGOeh9XxmWB
ZRUAoJpy4gx2tsYWDd6FmnMqshZtH2jIiQCVAgUBTHfiKlnrWVZ7aXD1AQKZbwP8
DRjNic6w1fbU3EOADy/740fy7Ota19ElpAto89ASs89uTKWumwgn/GGYjGCCv8Ad
7uBG3W4wyItfi8fhbvi9z5+St5ppE0T2rtVK7mTMYMqmWOgx/svePEpf5Zu+MoRQ
2ajqNmagP7nfgojE8L0KSvz2OVAg1S0dr41Qxb7fCfU=
=pI/0
-----END PGP SIGNATURE-----
On 27 Aug 2010, at 17:04, Graham Higgins wrote:
> This is the first of three posts reporting work-in-progress for each
> strand:
This is the second of the three posts predicted above...
"rdflib"
========
The main purpose of this repos is (now) simply convenience, to be able
to make small experimental changes to a separately-maintained core
rdflib library in pursuit of the supporting SPARQL/store and test
suite work, e.g. in order to leave the test suite relatively
undisturbed, adding a DEBUG keyword arg to the query API:
https://bitbucket.org/gjhiggins/rdflib/changeset/4bfdadca6b6c
However, the primary original purpose of the repos was to allow me to
explore any significant differences in the core rdflib that were
contained in the Cleveland Clinic "layercake" fork.
My (rather crude) approach was simply to "diff -uNr" the layercake
fork vs the standard 2.4.x package and visually inspect the result ---
ably assisted by TextMate.app which very handily provides folding for
diff files.
I was thus able to readily identify any layercake changes to the core
rdflib package vs changes to code in the sparql package.
As it transpired, all but two of the layercake changes to the standard
rdflib core have been incorporated in rdflib 3.0.0. The two remaining
changes (both in rdfib.term.py) are...
> --- a/rdflib/term.py
> +++ b/rdflib/term.py
> @@ -129,6 +129,8 @@ class URIRef(Identifier):
> def __eq__(self, other):
> if isinstance(other, URIRef):
> return unicode(self)==unicode(other)
> + elif isinstance(other,basestring):
> + return unicode(self)==other
> else:
> return False
>
> @@ -719,7 +721,10 @@ def _strToTime(v) :
> return strptime(v, "%H:%M:%S")
>
> def _strToDate(v) :
> - tstr = strptime(v, "%Y-%m-%d")
> + try:
> + tstr = strptime(v,"%Y-%m-%d")
> + except:
> + tstr = strptime(v,"%Y-%m-%dZ")
> return date(tstr.tm_year, tstr.tm_mon, tstr.tm_mday)
>
> def _strToDateTime(v) :
>
I was able to use my repos to check the result of applying these two
changes [1]. The _strToDate tweak caused no disturbance in the tests
but the unicode tweak caused multiple test failures, so I rescinded it
[2].
Whilst I was at it, I reinstated the ability to use epypdoc to
generate API docs [3]
And, during the course of a serious bit of tyre-kicking, found it
desirable to guard against an un-handled exception generated by
random.randrange [4]
Lastly, I'll cheerfully own up to reinstating setuptools [5] and
later, suitably enlighted by Uche, un-reinstated it [6]
[1] https://bitbucket.org/gjhiggins/rdflib/changeset/076f0ecfbfa4
[2] https://bitbucket.org/gjhiggins/rdflib/changeset/d565ccf38a32
[3] https://bitbucket.org/gjhiggins/rdflib/changeset/57e3e70bbc24
[4] https://bitbucket.org/gjhiggins/rdflib/changeset/8cd693a78bab
[5] https://bitbucket.org/gjhiggins/rdflib/changeset/050c53cd6fe9
[6] https://bitbucket.org/gjhiggins/rdflib/changeset/16ce291c96bf
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx37G0ACgkQOsmLt1Nhivx0xgCeM9AuRW1ogRW+Jwi92wQxSBWb
XbcAn2Rf/hQ0xibRwcnIXaAEWriyR51ciQCVAgUBTHfsbVnrWVZ7aXD1AQJmIgQA
ycMbqx1OYE/vl01ySi5YOOc9RIAhseuqFeGU1h7CVTXXyfZhiR8hoLU20UE7NKLN
elmQ4nrDExRpoVYvb8kk+U7gwsMMmDge1Tx8s6uFlUYj+tMQBHag/TFajuCcqhJo
hn9aUd21rsaCfms3jna4goAI1LIZ1Q8qvoCbI7eRwFw=
=Ih/E
-----END PGP SIGNATURE-----
On 27 Aug 2010, at 17:48, Graham Higgins wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 27 Aug 2010, at 17:04, Graham Higgins wrote:
>> This is the first of three posts reporting work-in-progress for
>> each strand:
>
>
> This is the second of the three posts predicted above...
And the most recent hudson build URL for my rdflib experimenting repos
is:
http://bel-epa.com/hudson/job/rdflibdev/10/console
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx37P4ACgkQOsmLt1NhivyI0gCg81opr2AhixrlAaDCf0AZrIqd
GtEAnikiwAmJwInkASzZGEN6rf9L47lwiQCVAgUBTHfs/lnrWVZ7aXD1AQJg8gP9
EQVf8foAN/aEbcVa7G/l+1whYzUSBhucRX2KBr28Nfrg2eQe/t90n6ueJ9bKRAG8
uzkiNY5ooGW4x7m1ynKFkZ6/icx36P/OAD2O//lgpQfrzOuHV9lN1HzEJScJ9eUM
7/kCBOlQGi0/WeA7KR8rILuZ89AQx/Cn3Hw2baYdd38=
=NdVJ
-----END PGP SIGNATURE-----
On 27 Aug 2010, at 17:04, Graham Higgins wrote:
> This is the first of three posts reporting work-in-progress for each
> strand:
This is the third and and last of the posts predicted above...
"fuxi-2010"
I agree with Will in his post to the FuXi discussion list: "FuXi is
probably the best RDF inference engine in existence". I also share
Will's commitment ("This is not all idle talk") and, to this effect,
this repository holds work aimed at getting FuXi running against
rdflib 3.0.0 and my augmented "rdfextras" repos.
The limited set of tests that nose discovers all pass. However, the
file "testOWL.py" appears to be the most important component, AIUI.
The test notes in test/OWL/OWL.README observe that testOWL.py needs to
be explicitly run from python.
I retrieved and installed the OWL test harness and arranged for hudson
to explicitly "cd tests; python testOWL.py", as instructed
Making sense of the output was taxing, so I seriously mucked about
with the print strings so that testOWL.py outputs results in
restructuredtext, as evidenced by the latest build report:
http://bel-epa.com/hudson/job/FuXi-2010/9/console
A lot of the tests fail, a few pass.
Working through the sequence of build reports will give you some idea
of the process; an initial build, followed by a short period in
dependency hell, followed by another build, this time with tests,
rinse and repeat extending test coverage as you go.
I had to disable one test because it was causing rdflib to throw an
exception which terminated the test run and so was preventing me from
seeing the results from a full run:
http://bel-epa.com/hudson/job/FuXi-2010/7/console
XML-ised results in a more presentable form will be maintained (for
the duration of the exercise) here:
http://bel-epa.com/area51/library/fuxitest.xml
I have further eye-watering detail of FuXi tests, but I will carry
those over in a separate post to FuXi-discuss.
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx4A6gACgkQOsmLt1NhivyXEgCeJibUUvFTqH8pDwPUAc9o8QGE
q7EAoMNDWij//LVZh5iwMzBq1YirrOwwiQCVAgUBTHgDqFnrWVZ7aXD1AQJpiQQA
gJFgs3CFWNcpJ23INC0yhFZPOPupe/V5d4IHxkqx1E0r74Pkc5K7sJOdfu7za6jT
Mfj+Q4Ox3lNdvex43U1z/qO6KlJ4iG8qHeYH2stwPfa0qcvf5tquYx2gZQzu6Tie
mADp5DDZhyKKfCTCQfUKFtxqYLD/jvxlsUx4U5648rs=
=/hpV
-----END PGP SIGNATURE-----
On 27 Aug 2010, at 19:27, Graham Higgins wrote:
> I have further eye-watering detail of FuXi tests, but I will carry
> those over in a separate post to FuXi-discuss.
http://groups.google.com/group/fuxi-discussion/msg/5526209013a5a48b
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkx4BlgACgkQOsmLt1NhivzGpACg9udcr573/HUEJiw4O3NcS+Dy
/JcAoIteQG7X1aE0RMayCZ7gW60N5g6viQCVAgUBTHgGWFnrWVZ7aXD1AQIsHQP/
c6zFjsAQkJt/R5kVAHbdBWjRKx3+aMbVss7eKEsOQ8WPA3yiWz8TrT9LB7NIVCwK
GNo4sfXVlCusL3ernKS+sETDElftH9lZ3cmR/4ZMTIHk6CXpN0WBWWzXkG8Ll+P+
6dCW7qEU56QsMGlZJkOHBLUQrbwFHtSX5p1Rw2X/TzM=
=X032
-----END PGP SIGNATURE-----
On 10-08-27 17:48, Graham Higgins wrote:
> Lastly, I'll cheerfully own up to reinstating setuptools [5] and
> later, suitably enlighted by Uche, un-reinstated it [6]
To follow up on a discussion that was happening on
the FuXi list. I strongly advocated setuptools and Uche
strongly discouraged it for what I am convinced are
good reasons. He suggested distribute instead.
The main reason for this is to have a consistent way
of discovering plugins that exist in other packages,
e.g. rdfextras and py4s (the latter cannot be included
in rdfextras for licensing reasons, GPL vs. BSD, the
former should include the back-ends that were removed
from rdflib 3.0.0 -- or the ones that still work, the
porting effort that Graham has been involved with
getting the several SPARQL implementations and additional
back-ends from the layercake branch running). It
seems to me that the natural way of doing this is with
entrypoints which are supported with both setuptools
and distribute.
If we don't do this we will either (1) have to invent
our own entrypoints-like system which is probably
not the best use of anyone's time or (2) using storage
back-ends and query processors from outside the
core library will be a PITA, requiring import statements
and try/except blocks instead of configuration
variables for applications.
So does introducing distribute and rejigging the
plugin.py sound like a reasonable strategy? If we were
to work on this would patches for it be accepted?
Cheers,
So does introducing distribute and rejigging the
plugin.py sound like a reasonable strategy? If we were
to work on this would patches for it be accepted?
- Gunnar
> --
> You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
> To post to this group, send email to rdfli...@googlegroups.com.
> To unsubscribe from this group, send email to rdflib-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rdflib-dev?hl=en.
>
>
On 9 Sep 2010, at 09:33, Gunnar Aastrand Grimnes wrote:
> The current sparql implementation in rdfextras is a mess, is there any
> reason why we should not move to yours?
I should hasten to make it clear that the implementation is -not-
mine, I'm simply doing some self-interested housekeeping.
I have just migrated a bunch of the SPARQL-related, rdflib-now-
rdfextras tickets over to the rdfextras area and during the process I
(re-)discovered Chimezie's Feb 13th commit of his legacy-sparql2sql
variant of the sans-C SPARQL implementation.
A diff revealed a number of changes not present in either of the two
extant implementations in my rdfextras clone (the Ivan Herman
'default' implementation and the '1st alternate' layer-cake
implementation). So I refactored the implementation for rdflib 3.0.0
and separately committed the significant code changes in order not to
lose the work. I have committed it as 'sparql2sql'.
A couple of months ago, on the fuxi-discussion list, Vasily Faronov
briefly touched on the relative completeness of the different
implementations. I've been wondering about this myself, so I thought
it would be useful to re-institute the DAWG SPARQL tests using the
newer r2 suite and I have applied the test suite to all three
implementations.
Hudson is keeping track of all the gory details:
http://bel-epa.com/hudson/job/rdfextras/28/consoleText
But summarising:
DAWG suite
sparql ccfsparql sparql2sql
Passed 339 344 344
Failed 43 42 42
Skipped 55 54 54
Error 3 0 0
non-DAWG
sparql ccfsparql sparql2sql
Passed 55 52 51
Failed 2 4 4
Skipped 4 4 4
Error 0 0 0
There are a couple of non-DAWG FAILs common to all three, referencing
told BNodes and I just -have- to share this screenshot of the results
of a Google search for 'told BNodes'.
http://imagebin.ca/view/i3wAA2R5.html
'tall BLondes' indeed.
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkyQSDsACgkQOsmLt1NhivwCSgCg4g2GMLU2GRMNRx8R5SFphMFg
whkAn3Tp16c2kb+ilKpqbBrOJldZmgDXiQCVAgUBTJBIO1nrWVZ7aXD1AQLx5gP/
VXe1t2aQ2rDCQlp+wok/exmBJYYsxfRE1GogjU10qgVII8LuzkryYJFQoffV9eNh
j4zkWxR5JSunZCo3IQ7quf19aFV2E8ajkkbzvI09pwq0Ql1iYtM2anBpdr+oIFBU
kNJ75q5jHvdOEWQX3nuuZs9G5XVY4NOCY7Wx7SceqIs=
=x3Xi
-----END PGP SIGNATURE-----
On 6 Sep 2010, at 12:35, Vasiliy Faronov wrote:
Hi Vasiliy,
> Your links to the docs at http://bel-epa.com/rdflibapidocs/ and
> http://bel-epa.com/rdflib3docs/ now return 410 Gone. Any plans to get
> them back up?
No plans at the moment. I think it makes better sense for the docs to
be available via more ah, predictable resources such as via the google
group pages, the wiki, the repos and the dedicated PyPi docs section.
On a related note, (in an attempt to step up to the mark with respect
to coming up with an answer to Gunnar's question about changing the
default sparql implementation) I've found an old doc [1] by Ivan
Herman describing the original sparql-p implementation "SPARQL in
RDFLib (Version 2.1)", now doing duty as the rdfextras default sparql.
I'm also raiding Chime's blog posts and other archives for his
descriptions of the "Compositional SPARQL semantics" approach that he
took (e.g. [2]) and which he describes as a "Full implementation of
the W3C SPARQL Algebra. This should provide coverage for the full
SPARQL grammar (including all combinations of GRAPH). Includes unit
testing and has been run against the old DAWG testsuite." [3] There's
a lot more specific detail in the docstring of one of the early
versions of the implementation [4].
[1] http://bit.ly/bp0vRL
[2] http://www.mail-archive.com/public-s...@w3.org/msg00040.html
[3] http://cia.vc/stats/project/rdflib/.message/28516
[4] http://code.google.com/p/rdflib/source/browse/trunk/rdflib/sparql/bison/CompositionalEvaluation.py?r=1119
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkyQvQ8ACgkQOsmLt1NhivyS+ACgxp6RxESN35ucV3/g0N88/mRj
DGUAn33X9fbS2Ge9ceYv6Wd3m7EhAh5giQCVAgUBTJC9D1nrWVZ7aXD1AQIS2wQA
oivPNbOPFtsNiRJcMFHI5dXbwbD+rw8RksISB1jha4oPIdCU8cuovslaG5mMrLNM
6M7xOm/lo568Ft4eGeUmogP/C7vrvWZkD2rjk786zFFcy49amCc1bmTns7FiEkmG
5x8rWRTpbw2StkBO7wSIlY8j2lIoXftTYzrFmK4o8L0=
=Mk/c
-----END PGP SIGNATURE-----
On 15 Sep 2010, at 13:33, Graham Higgins wrote:
> original sparql-p implementation
Which Ivan describes as:
"... based on the July 2005 version of the SPARQL draft worked on at
the W3C. For a lack of a better word, I refer to this implementation
as sparql-p."
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkyQvtIACgkQOsmLt1NhivwCZgCgp6CYeVg1Fq+JBuB3xLg0IZev
r2kAniNUKJvl/zngaoFc6OEkd5zeMiaPiQCVAgUBTJC+0lnrWVZ7aXD1AQLh5AQA
1u23Hsj80IFccWRMqq2rvub/3BkDaWfz+Ywvrkued1wB4u50fpCGAkA7Y1/RTiaR
mz4ePcP6rWtGZPfrwz7h1fKYQneJc4mJkToTb4Omp2gCLJM7SOhU31WZIO53V8G7
nCoWkK6lzsk9b5lKiGWGp5cLv1v616Ho94wXynS96nA=
=q6tD
-----END PGP SIGNATURE-----
On Wed, Sep 15, 2010 at 8:33 AM, Graham Higgins <gjhi...@gmail.com> wrote:
> I'm also raiding Chime's blog posts and other archives for his descriptions
> of the "Compositional SPARQL semantics" approach that he took (e.g. [2]) and
> which he describes as a "Full implementation of the W3C SPARQL Algebra. This
> should provide coverage for the full SPARQL grammar (including all
> combinations of GRAPH). Includes unit testing and has been run against the
> old DAWG testsuite." [3] There's a lot more specific detail in the docstring
> of one of the early versions of the implementation [4].
This 'compositional semantics' approach is based on the "Semantics of
SPARQL" paper [1]. it is not a straight-forward read but has alot of
details that when combined with Ivan's, older, well-documented
descriptions of sparql-p should be enough to get the basic idea. The
evaluation of a BGP is done via sparql-p tree expansion. I'm not sure
which versions of this algebriac SPARQL implementation (was
rdflib/sparql/bison/Algebra.py at one time) you are looking at, but
the later versions are much more mature (in terms of testing, etc.).
[1] http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf
On 15 Sep 2010, at 14:32, Chimezie Ogbuji wrote:
> Graham. I thought I'd offer some help :)
Thanks.
> This 'compositional semantics' approach is based on the "Semantics of
> SPARQL" paper [1]. it is not a straight-forward read but has alot of
> details that when combined with Ivan's, older, well-documented
> descriptions of sparql-p should be enough to get the basic idea. The
> evaluation of a BGP is done via sparql-p tree expansion.
That's helped me get a better understanding of how the different
versions fit together (legacy-sparql2sql, layer-cake and the new
rdflib 3.0.0 default).
> I'm not sure
> which versions of this algebriac SPARQL implementation (was
> rdflib/sparql/bison/Algebra.py at one time) you are looking at, but
> the later versions are much more mature (in terms of testing, etc.).
I now realise that all of the versions that I've been looking at are
basically the same early-2010 codebase which was refactored to create
the rdflib 3.0.0 default.
On 9 Sep 2010, at 09:33, Gunnar Aastrand Grimnes wrote:
> The current sparql implementation in rdfextras is a mess, is there any
> reason why we should not move to yours?
Now that I have a clearer idea of what's what, I can give a sensible
answer: there doesn't seem to be anything to be gained by switching.
Dan lifted the latest algebraic SPARQL implementation and re-factored
it into a separate package, leaving behind the SPARQL query pre-
compilation and SQL efficiency code.
This latter code is in the sparql2sql module and at some point it can
become a separate plugin if people want. As the docs note: pre-
compilation can be useful for avoiding redundant parsing overhead for
queries that need to be evaluated repeatedly and the (feature-
complete) SPARQL-SQL translation generates flat SQL statements for
efficient processing by relational database query engines.
There is one dangling issue... I found I needed to open up the rdflib
3.0.0 query API - in the end I simply passed **kwargs straight
through, allowing use of parsedQuery keywords in the sparql2sql and
offering a route for a DAWG compliance switch.
As I mentioned in a previous post, I resurrected the DAWG tests that
Chimezie added earlier, lightly edited the code to use the v2 test set
data and ran it with the default rdflib 3.0.0 SPARQL implementation...
Ran 496 tests in 217.280s
FAILED (errors=44, failures=27)
I moved some tests into a skiplist [1]: tests of XML type promotion
which, AFAIK, hasn't been implemented, a couple of apparently
unimplemented filters: "langmatches" and "sameterm", along with
several tests that were hitting a DAWG_DATASET_COMPLIANCE-driven
assertion violation w.r.t. parentGraphs having URIRefs and /not/ BNode
identifiers.
That brought it down to 440 tests: 5 errors, 14 fails and 51 skipped.
HTMLised summary here:
http://bel-epa.com/hudson/job/rdfextras/ws/nosetests-sparql.xhtml
and the (near-identical) SPARQL2SQL results for comparison:
http://bel-epa.com/hudson/job/rdfextras/ws/nosetests-sparql.xhtml
Results of 400-odd tests run on the back-end stores are similarly
presented:
http://bel-epa.com/hudson/job/rdfextras/ws/nosetests-store.xhtml
[1] http://bitbucket.org/gjhiggins/rdfextras/src/tip/test/test_sparql/DAWG/test.py#cl-45
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkyZPJYACgkQOsmLt1NhivySxACeIMtFYDnP69jYoUzpSKW84YuP
tCoAnjQMd8aC1S/7+fQuW81hU8Ek+K1/iQCVAgUBTJk8llnrWVZ7aXD1AQLyjwQA
t+lgAewftrnHXJpbWqrIupX9lJ+S/0XVdQObE7fcuCWDFou3RiAKBPfbSqF4w28F
PlNroi9G1jSNMq/8yd+Q8Lv0N+joiV6vR0XbQOoj2kMwTxQGgvfx6Pej4fskOPt4
19cnDTM4cdDOPwQHcx0t5omkIDOv2/kv2z31y8MKsoM=
=qot/
-----END PGP SIGNATURE-----
> There is one dangling issue... I found I needed to open up the rdflib 3.0.0 query API - in the end I simply passed **kwargs straight through, allowing use of parsedQuery keywords in the sparql2sql and offering a route for a DAWG compliance switch.
I'd be happy to apply your patch for the one dangling issue in http://code.google.com/p/rdflib/issues/detail?id=134 . It will involve rolling back the change in http://code.google.com/p/rdflib/source/detail?r=1893 .
There has been several other bug fixes that would be good to get out as well; I'll plan on cutting a 3.0.1 release.
cheers,
--eik
On 22 Sep 2010, at 00:15, Graham Higgins wrote:
> That brought it down to 440 tests: 5 errors, 14 fails and 51 skipped.
As a final exercise I subjected the rdflib SPARQL implementations
(3.0.0 and 2.4.X layer-cake) to informatik.uni-freiburg's "SP²Bench
SPARQL Performance Benchmark" [1,2].
"Our benchmark comprises a data-generator for arbitrarily large
documents, which builds upon the well-known DBLP scenario, and thus
comes close to a real-world application scenario. The benchmark
queries implement meaningful requests on top of this data, thereby
testing typical SPARQL operator constellations and RDF access
patterns. With this focus, our benchmark allows to easiliy detect
deficiencies in current SPARQL implementations and can be used to tune
existing engines."
The dataset mirrors "existing DBLP bibliography" and "the structure of
document classes (such as articles, journals, proceedings,
inproceedings, etc.), relations between document classes (e.g. between
inproceeding and proceedings), characteristics of authors and
coauthors, and parts of the citation system"
A couple of query examples taken from the full list [3]:
Q3(a) Select all articles with property swrc:pages.
Q5(a) Return the names of all persons that occur as author of at least
one inproceeding and at least one article.
I chose the smallest dataset that I could get away with (10,000
triples). Anything larger caused processing times to rise to quite
tedious levels on my MacBook Pro (2.4GHx core duo, 2Gb RAM).
I had difficulty getting any two of the rdflib SPARQL implementations
to agree on a set of answers, so I recruited help from sesame2 and
joseki, running locally under tomcat.
Notes
=====
Key:
lyrck = rdflib 2.4.1 layer-cake
spql = rdflib 3.0.0 SPARQL
s-2sql = rdflib 3.0.0 SPARQL2SQL
sesame = sesame2 app under tomcat, accessed via HTTP@localhost
joskei = joseki/Jena app under tomcat, accessed via HTTP@localhost
For all three rdflib SPARQL implementation, Q04 and Q05(a) failed to
return -- or rather Q04 didn't return during an hour-long BBC 4
programme on the Battle of Britain and its high CPU usage heated the
machine up quite considerably. Q05 was showing the same tendency, so I
skipped both tests. In the end, only joseki's and sesame's agreed
completely.
Query lyrck spql s-2sql sesame joseki
Q01 1 1 1 1 1
Q02 147 147 147 147 147
Q03a 846 0 0 846 846
Q03b 9 0 0 9 9
Q03c 9 0 0 0 0
Q04 23226 23226
Q05a 155 155
Q05b 155 155 155 155 155
Q06 229 280 229 229 229
Q07 42 42 42 0 0
Q08 184 184 184 184 184
Q09 4 4 4 4 4
Q10 166 166 166 166 166
Q11 10 10 10 10 10
Q12a 1 1 1 1 1
Q12b 1 1 1 1 1
Q12c 0 0 0 0 0
and, for casual interest, some naive timings (i.e. time(), query(),
time()) - I should emphasise both sesame and joseki were accessed via
localhost, hence free of any latency issues which might otherwise
affect real OW operations:
Query lyrck sparql s-2sql sesame joseki
Q01 0.504 0.526 0.650 0.377 0.008
Q02 0.154 0.141 0.144 0.172 0.084
Q03a 0.436 0.384 0.386 0.093 0.057
Q03b 0.364 0.319 0.322 0.005 0.007
Q03c 0.359 0.320 0.320 0.004 0.007
Q04 2.879 14.840
Q05a 3.684 1.822
Q05b 15.207 14.104 14.305 0.069 0.585
Q06 45.845 40.621 41.004 2.040 1.244
Q07 17.951 14.727 15.672 0.151 0.298
Q08 12.190 9.992 11.487 0.717 0.025
Q09 0.356 0.625 0.368 0.021 0.016
Q10 0.016 0.014 0.014 0.008 0.018
Q11 0.085 0.081 0.094 0.040 0.011
Q12a 130.470 137.720 123.197 0.033 0.008
Q12b 13.974 17.971 12.524 0.193 0.008
Q12c 0.009 0.009 0.008 0.023 0.009
In case anyone's interested in replication, I uploaded a gzipped file
of the N3 triplebase [4] to the group's files area, along with the
data tables and charts as an OpenOffice spreadsheet [5].
[1] http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B
[2] http://arxiv.org/pdf/0806.4627v2
[3] http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B/queries.php
[4] http://groups.google.com/group/rdflib-dev/web/sparql.ods.zip
[5] http://groups.google.com/group/rdflib-dev/web/10kdata.n3.zip
- --
Cheers,
Graham
http://www.linkedin.com/in/ghiggins
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAkybcRUACgkQOsmLt1NhivzpKwCdEt+GKiJmgwHv/ysOC58qL7VS
QhQAnic7KDNfqYEsBn7gd7lQULf8pc3fiQCVAgUBTJtxFVnrWVZ7aXD1AQKp0AP8
CraQetNvMtyrk6EUcxUbBh9r/0WVzwnlNGw1rsYYF8Cpfo5z1Hm+goFU5hcLRECt
aWgZjIMjatPn5t1BffICmrYoc3vBiuMG8n96ZTejiqdFHjD5qOFKNiNyG4t+kWWs
XbpN2HSGIcDBiyVbYCZmfVzPXMGT0Uph2P4YqsoHRK0=
=eXOO
-----END PGP SIGNATURE-----
Are the API docs automatically built and hosted by PyPi? I think
currently they are nowhere?
I.e. the 3.0.0 version of this:
http://www.rdflib.net/rdflib-2.4.0/html/index.html
?
Cheers!
- Gunnar