New issue 116 by eikeon: decide what to do with sparql support
http://code.google.com/p/rdflib/issues/detail?id=116
The sparql support currently has a bunch of issues and is in need of some
major cleanup. We've
recently removed a fair amount of unmaintained/unstable code from rdflib
(all non core
functionality and mostly store implementations). If we hope to release
rdflib 2.5.0 anytime soon
we will likely need to remove the sparql support (despite it being arguable
more core than an
arbitrary store implementation).
If and when we get the sparql support into a more solid and working state
we can include it back
into rdflib proper. Perhaps if we get enough of the original authors
helping out we can work
though the issues in time for a 2.5 release like we did for the rest of the
issues. Else let's work
on the sparql support over in the rdfextra project.
The sparql support already plugs into rdflib as a plugin and so separating
it into its own release
should be straightforward.
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
I actually have the same issue with the plan of changing the module
names (if I understand it well, I will have to say import rdflib.graph
instead of rdflib.Graph). This change is obviously simpler than the
removal of SPARQL but it will still force me to change a bunch of code
if I want to use the new distribution. I think that is wrong.
There are some past decisions that we have to live with. As far as I
am concerned, keeping current code properly running is more important
than an aesthetic change...
Ivan
On Feb 9, 4:24 pm, rdf...@googlecode.com wrote:
> Status: Accepted
> Owner: eikeon
> CC: chimezie, John.L.Clark
> Labels: Milestone-Release2.5 Component-Sparql
>
> New issue 116 by eikeon: decide what to do with sparql supporthttp://code.google.com/p/rdflib/issues/detail?id=116
People that want to use the v2.x.x line can continue to do so. The
significance of going to 3.x would be to indicate that backwards
compatibility is going to be broken. [1]
//Ed
> You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
> To post to this group, send email to rdfli...@googlegroups.com.
> To unsubscribe from this group, send email to rdflib-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rdflib-dev?hl=en.
>
>
A fair number of fixes have accumulated in trunk and we've worked hard to address the backlog of issue. In doing so it has become necessary to shed some of the unstable and unmaintained code such that we can keep the remaining code properly running. Not being able to run the test cases was the primary motivation for finally dealing with the module name ambiguity issue.
Even though we have moved some of the unstable bits out of rdflib proper Lindstream has created an rdfextras[1] project for them and we hope to stabilize them and maintain them.
Although we are working hard to keep rdflib's interface from changing, what is currently in trunk looks like it will result in a 3.0 release. As such applications can stay depending on 2.x or move to 3.x as appropriate. And we can continue to maintain 2.x if there's enough demand and interest in doing so; but at the moment it looks like the critical mass is behind moving forward with a 3.0 release.
--eikeon
[1] http://code.google.com/p/rdfextras/
This issue was updated by revision r1844.
Moving to rdfextra for the time being
Hope you realize this has immediately broken HEAD -- setup.py still
references (now
non-existent) rdflib/sparql package ...
i removed the package from setup.py in r1846
On Tue, Feb 9, 2010 at 12:24 PM, Daniel Krech <eik...@eikeon.com> wrote:
> We are doing what we can to keep from breaking existing code. As for the changing of module names, it was not motivated by aesthetics, but rather real issues that both code and humans were tripping up on. Also, we have put back most of the 'convenience imports' so that rdflib.Graph, rdflib.URIRef etc will continue to refer to the classes. So existing code should continue to work.
I see recent activity regarding this. This has been the main barrier
from completely synching up with rdflib svn trunk. All of my existing
libraries (FuXi mainly) are still using the from rdflib.Graph import
... and from rdflib import Literal, URIRef, etc. So, I haven't yet
switched over and have a handful of changes to the SPARQL algebra
implementation and SPARQL-to-SQL implementation that might address
many of the issues identified with the dated versions of these in the
svn trunk. If I can get some help with this interface porting (since
I have alot of porting to do to completely synch back) I could help
with what is needed to keep SPARQL support in the proposed releases.
Some way to address the backwards compatibility API breakage would
greatly help with this.
Some addition comments below.
> On Feb 9, 2010, at 11:06 AM, Ivan wrote:
>> As an author of the original SPARQL API part I am painfully aware of
>> the fact that it is incomplete, and I wish I had time to pick this
>> project up seriously. Apologies for that. However, whatever the issues
>> are, the module is used (and I use it in my systems, too). I am very
>> wary of coming out with a distribution that will break existing code
>> big time.
+1 for me also because of the rdflib-dependent libraries I use that would break
>>> The sparql support currently has a bunch of issues and is in need of some
>>> major cleanup. We've
>>> recently removed a fair amount of unmaintained/unstable code from rdflib
>>> (all non core
>>> functionality and mostly store implementations). If we hope to release
>>> rdflib 2.5.0 anytime soon
>>> we will likely need to remove the sparql support (despite it being arguable
>>> more core than an
>>> arbitrary store implementation).
This would be unfortunate and would essentially eliminate the one
thing I use rdflib for more than anything else.
>>> If and when we get the sparql support into a more solid and working state
>>> we can include it back
>>> into rdflib proper. Perhaps if we get enough of the original authors
>>> helping out we can work
>>> though the issues in time for a 2.5 release like we did for the rest of the
>>> issues.
See my comment above. I have many outstanding fixes that may address
many of the existing issues, but I can't do much with them with the
current API incompatibility.
-- Chime
I agree with you - removing sparql from the core is a crying shame.
I spent a couple of hours trying to get my head around the current
code, but I didn't really get anywhere.
If you have improvements in another repository, I'll be happy to help
you move it to the current state of the core apis!
Where do I find your code?
- Gunnar
> --
> You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
> To post to this group, send email to rdfli...@googlegroups.com.
> To unsubscribe from this group, send email to rdflib-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rdflib-dev?hl=en.
>
>
Chime,
When I was working on porting the new SPARQL parser, I became fairly
familiar with the API changes. I'd be happy to work with you on
getting things synced up between the various RDFLib threads of
development. Would you like to set up a sprint to work on getting
things synced up, and ironing out SPARQL implementation and backing
store issues? You and I could meet in meatspace (as well as any other
Cleveland RDFLib hackers (are there any?)), and anyone else so
inclined could join virtually. Let me know what you think.
Take care,
John L. Clark
--
PLEASE NOTE that this message is not digitally signed. As a result,
you have no strong evidence that this message was actually sent by me.
Upon request I can provide a digitally signed receipt for this
message or other evidence validating its contents if you need such
evidence.
On Thu, Feb 11, 2010 at 1:35 PM, Gunnar Aastrand Grimnes
<grom...@gmail.com> wrote:
> I spent a couple of hours trying to get my head around the current
> code, but I didn't really get anywhere.
> If you have improvements in another repository, I'll be happy to help
> you move it to the current state of the core apis!
> Where do I find your code?
See: http://code.google.com/p/rdflib/source/detail?r=1861
-- Chime
On Fri, Feb 12, 2010 at 4:06 PM, John L. Clark <john.l...@gmail.com> wrote:
> When I was working on porting the new SPARQL parser, I became fairly
> familiar with the API changes. I'd be happy to work with you on
> getting things synced up between the various RDFLib threads of
> development. Would you like to set up a sprint to work on getting
> things synced up, and ironing out SPARQL implementation and backing
> store issues? You and I could meet in meatspace (as well as any other
> Cleveland RDFLib hackers (are there any?)), and anyone else so
> inclined could join virtually. Let me know what you think.
Sounds like a plan. I just committed my current working copy (written
to the old API) into
/branches/legacy-sparql2sql/sparql
-- Chime
See: http://code.google.com/p/rdflib/source/detail?r=1861
Comment #6 on issue 116 by eikeon: decide what to do with sparql support
http://code.google.com/p/rdflib/issues/detail?id=116
Thank you Chimezie for committing the your latest sparql code. There's a
number of us that are willing to
help stabilize the code and fix up the issues, but we were having a hard
time getting our arms around the
code. Ideally we can make a plan and all help get the sparql support added
back to trunk.
I noticed the branch is called sparql2sql; does the implementation depend
on the store being an sql one? I
noticed a fair amount of sparql code in trunk depended on the mysql store
related code when I was moving it
out of trunk.
I'll update the Roadmap now and add a section for a sparql plan.
Certainly. The most recent development was a joint work with Brendan
Elliot (mostly Brendan) as part of his P.h.D thesis, so I had to get
familiar with the code myself when I originally ported it over to the
rdflib API at the time. However, John and myself (at the very least)
are familiar with it.
> Ideally we can make a plan and all help get the sparql support added
> back to trunk.
Ok, anyway we can facilitate that I'm willing to help (wiki, sprint,
IRC, whatever)
> I noticed the branch is called sparql2sql; does the implementation depend on
> the store being an sql one?
So, this is part of the difficulty. Let me give some context.
sparql.Processor has a subclass in rdflib.sparql.bison.Processor which
has the following decision tree:
I. if the store is is one of the from rdflib.store.MySQL.SQL stores
(this path is misleading since this store instance is meant to appeal
to the ANSI/SQL standard with specializations for MySQL, Postresql,
Oracle) then TopEvaluate from rdflib.sparql.sql.RelationalAlgebra will
be used. This is the entry point for the compete SPARQL-to-SQL method
implementation [1].
II. Otherwise, TopEvaluate from rdflib.sparql.Algebra is used instead.
This is the 'store-agnostic' in-memory SPARQL algebra implementation
which evaluates the algebra using the (original) sparql-p expansion
tree implementation
The first option is the SPARQL-to-SQL option, which does assume the
store is a SQL store, since the main motivation was to leverage
traditional relational databases (SQL stores) to evaluate SPARQL
efficiently.
The second option doesn't make any assumptions about the underlying
store. So, the main difficulty is in making an interface that can
handle this combination and this is what the the processor tries to do
in that module.
> I noticed a fair amount of sparql code in trunk depended on the mysql store related code when I was moving it out of trunk.
Starting from the SQL TopEvaluate and leading up to line 69 in
rdflib/sparql/sql/RelationalAlgebra.py, there should be no assumptions
about the kind of SQL store:
sql = sb.Sql()
params = sb.Params()
if DEBUG:
print " SQL: " + sql%params
At this point however, the SQL that results from the original SPARQL
needs to be evaluated against a SQL store in order to generate results
from it up to the user. So, I can imagine (if we want to make a sharp
distinction between general assumptions about using SQL to evaluate
SPARQL and the specific Python DB API for a given SQL store) that the
part leading up to there can be refactored out so we have:
SPARQL
-------------
ANSI/SQL
-------------
Store-specific SQL (to account for MySQL-specific SQL over Postgres or
Oracle SQL, for instance)
-------------
Evaluate against specific store
> I'll update the Roadmap now and add a section for a sparql plan.
Ok, thanks. I hope this helps
I would like to move the decision tree(s) out from within the sparql implementation and up to the plugin level. This has been the intent for some time and it would be nice to clean up the interface so we can get finally realize it. I also think it's the fastest path to some solid and stable sparql support.
I have started by moving the pure-python-no-sql sparql implementation that was in rdflib's trunk as of r1843 info rdfextras[1]. The sparql processor and result object can be registered as rdflib's 'sparql' processor and result object respectively as follows:
rdflib.plugin.register('sparql', rdflib.query.Processor,
'rdfextras.sparql.processor', 'Processor')
rdflib.plugin.register('sparql', rdflib.query.Result,
'rdfextras.sparql.query', 'SPARQLQueryResult')
I've done this in test/test_sparql.py and can run the test cases (see below). Of the 10 tests; 2 of them are passing; 5 or 6 are failing with what appears to be some list vs. tuple issue. Perhaps the list vs. tuple issue is the result of various parts of the decision tree getting out of sync? I am not sure if or how the various paths were getting tested as the sparql code evolved.
I am hoping we can clean up this version of sparql support in rdfextras to be a pure-python-no-sql option... in addition to creating implementations for the other parts of the decision tree.
neko:rdfextras eikeon$ nosetests test/test_sparql.py.FF.FFFEFF======================================================================
ERROR: test_case_insentitive (test_sparql.TestTermConstraints)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 228, in test_case_insentitive
"""))
File "/Library/Python/2.6/site-packages/rdflib/graph.py", line 770, in query
return result(processor.query(query_object))
File "/Users/eikeon/rdfextras/rdfextras/sparql/processor.py", line 43, in query
extensionFunctions=extensionFunctions)
File "/Users/eikeon/rdfextras/rdfextras/sparql/algebra.py", line 319, in TopEvaluate
None)
File "/Users/eikeon/rdfextras/rdfextras/sparql/algebra.py", line 168, in ReduceToAlgebra
prolog))
File "/Users/eikeon/rdfextras/rdfextras/sparql/evaluate.py", line 386, in createSPARQLPConstraint
constraint=const)
File "/Users/eikeon/rdfextras/rdfextras/sparql/evaluate.py", line 263, in mapToOperator
expr.arg3 and ',"'+expr.arg3 + '"' or '',
TypeError: cannot concatenate 'str' and 'ParsedConditionalAndExpressionList' objects
======================================================================
FAIL: test_match_literal_arbitary_type (test_sparql.TestRDFLiterals)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 189, in test_match_literal_arbitary_type
self.assertEqual(results, expected_results)
AssertionError: [rdflib.term.URIRef('http://example.org/ns#z')] != [(rdflib.term.URIRef('http://example.org/ns#z'),)]
======================================================================
FAIL: test_match_literal_numeric_type (test_sparql.TestRDFLiterals)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 181, in test_match_literal_numeric_type
self.assertEqual(results, expected_results)
AssertionError: [rdflib.term.URIRef('http://example.org/ns#y')] != [(rdflib.term.URIRef('http://example.org/ns#y'),)]
======================================================================
FAIL: http://www.w3.org/TR/rdf-sparql-query/#constructGraph
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 144, in test_construct
self.assertEqual(results, expected_results)
AssertionError: <rdfextras.sparql.query.SPARQLQueryResult object at 0x1007d03d0> != <Graph identifier=ELTQCwMD22 (<class 'rdflib.graph.Graph'>)>
======================================================================
FAIL: http://www.w3.org/TR/rdf-sparql-query/#MultipleMatches
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 84, in test_multiple_matches
self.assertEqual(results, expected_results)
AssertionError: [[rdflib.term.Literal(u'Johnny Lee Outlaw'), rdflib.term.URIRef('mailto:jl...@example.com')], [rdflib.term.Literal(u'Peter Goodguy'), rdflib.term.URIRef('mailto:pe...@example.org')]] != [(rdflib.term.Literal(u'Johnny Lee Outlaw'), rdflib.term.URIRef('mailto:jl...@example.com')), (rdflib.term.Literal(u'Peter Goodguy'), rdflib.term.URIRef('mailto:pe...@example.org'))]
======================================================================
FAIL: http://www.w3.org/TR/rdf-sparql-query/#WritingSimpleQueries
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 54, in test_simple_query
self.assertEqual(len(result_data[0]), 1)
AssertionError: 15 != 1
======================================================================
FAIL: test_numeric_values (test_sparql.TestTermConstraints)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 245, in test_numeric_values
self.assertEqual(results, expected_results)
AssertionError: [[rdflib.term.Literal(u'The Semantic Web'), rdflib.term.Literal(u'23', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))]] != [(rdflib.term.Literal(u'The Semantic Web'), rdflib.term.Literal(u'23', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))]
======================================================================
FAIL: test_string_values (test_sparql.TestTermConstraints)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/eikeon/rdfextras/test/test_sparql.py", line 216, in test_string_values
self.assertEqual(results, expected_results)
AssertionError: [rdflib.term.Literal(u'SPARQL Tutorial')] != [(rdflib.term.Literal(u'SPARQL Tutorial'),)]
----------------------------------------------------------------------
Ran 10 tests in 0.441s
FAILED (errors=1, failures=7)
neko:rdfextras eikeon$
neko:rdfextras eikeon$
[1] http://code.google.com/p/rdfextras/
> --
The reason is that the current sparql engine tries to be helpful if
there only a single variable, or only a single result (I forget the
details), and does NOT include the list then. This is a bit odd and
makes processing results tricky.
I would fix this by moving to always returning a list of namedtuple
objects? They are really much nicer. Namedtuple was introduced in 2.6,
but we can copy/paste the code from python itself and use that if on
2.5?
-Gunnar
Glad to help.
> I would like to move the decision tree(s) out from within the sparql implementation and up to the plugin level. This has been the intent for some time and it would be nice to clean up the interface so we can get finally realize it. I also think it's the fastest path to some solid and stable sparql support.
+1
It would be a good opportunity to also somehow incorporate the recent
suggestion from
(http://code.google.com/p/rdflib/issues/detail?id=121#c0) for better
general support for stores with different levels of capability.
> I have started by moving the pure-python-no-sql sparql implementation that was in rdflib's trunk as of r1843 info rdfextras[1]. The sparql processor and result object can be registered as rdflib's 'sparql' processor and result object respectively as follows:
> rdflib.plugin.register('sparql', rdflib.query.Processor,
> 'rdfextras.sparql.processor', 'Processor')
>
> rdflib.plugin.register('sparql', rdflib.query.Result,
> 'rdfextras.sparql.query', 'SPARQLQueryResult')
Ok. Will the pure-python-sparql-2-sql processor be registered in the same way?
> I've done this in test/test_sparql.py and can run the test cases (see below). Of the 10 tests; 2 of them are passing; 5 or 6 are failing with what appears to be some list vs. tuple issue. Perhaps the list vs. tuple issue is the result of various parts of the decision tree getting out of sync? I am not sure if or how the various paths were getting tested as the sparql code evolved.
Ok, I'll take a look at this (and at Gunnar's follow up response) to help.
-- Chime
This issue was updated by revision 29c368cce983.
Comment #8 on issue 116 by gromgull: decide what to do with sparql support
http://code.google.com/p/rdflib/issues/detail?id=116
(cleaning up issue list)